Technical Articles
SAP Datasphere – New Replication Flow
In this article, we’ll take a look at one of the new features of SAP Datasphere which is New Replication Flow.
Background:
We already know replication capability is available in SAP Datasphere with Smart Data Integration (SDI) and SAP is not going to remove it. With the “New Replication Flow”, SAP Basically bring in a new cloud-based replication tool. This cloud-based data replication tool is designed to simplify data integration processes by eliminating the need for additional on-premises components. This means that it does not rely on DP-Server/DP-Agent technology which requires installation and Maintenace but instead uses the Data Intelligence Embedded environment and Data Intelligence Connectors to connect to remote sources.
User Interface:
When it comes to the user experience, it also has been integrated and inbuilt into the existing data builder and the same comes with the monitoring features, the replication flow monitoring into the existing data integration monitor where users already find as of today the monitoring for the data flows.
When to use replication flow?
If you want to copy multiple data assets from the same source to the same target in a fast and easy way and do not require complex projections.
One thing we need to keep in mind is that replication flows will copy only certain source objects which are mentioned below.
- CDS views (in ABAP-based SAP systems) that are enabled for extraction.
- Tables that have a unique key (primary key)
- Objects from ODP providers, such as extractors or SAP BW artifacts
The New Replication Flow also supports Delta load along with the initial load, for the initial release the delta duration is fixed to 60 minutes that means the delta load is happening every 60 minutes and capturing the replicating changes from the source to the target that has been selected and these would be further extended. (Ex: kind of a scheduled batch delta replication in future)
Please find the details here: Load Type
Use case and overview comparison:
Overview of Connectivity: SAP HELP – Connection Types Overview
On the source system side
- SAP S4HANA Cloud or S4HANA on-premises, where we mainly talk about CDS Views extraction.
- SAP ECC or Business Suite systems that we connect via SLT for mainly table-based extraction and the DMIS add-on will be installed (DMIS add-on is kind of a requirement that we need that brings in all the prerequisites into the SAP system that we need as a framework or a foundation to be able to use the replication flows).
- SAP BW and BW/4HANA integration (We have different data assets that can be exposed via ODP, like ADSOs, DSOs and so on)
- SAP HANA Cloud as well as HANA on-premises.
- Non-SAP source which is Azure Microsoft SQL database that we can use as a source.
On the Target system side
- SAP Datasphere.
- Standalone HANA on-premises and Standalone HANA cloud.
- HANA Data Lake Files.
With that lets jump into the scenario…
In this scenario, I am going to use SAP HANA Cloud system as a source.
Let’s see the connection creation with the source system.
- Go the dedicated Datasphere space and click on “Go to connections”.
- Click on Create connection and you can see the list of connection types.
- Click on information icon of SAP HANA Cloud connection, there you go… it supports Replication Flows
- Select the connection and provide the information about the source system you have, that’s it. You are good to go…
Now, we will see how to create a Replication Flow in SAP Datasphere.
- Jump into the Data Builder and click on New Replication Flow.
Note: If you don’t find the “New Replication flow”, Please Check whether you have “SAP Datasphere Integrator” role assigned to your user.
- To choose the source for replication click on the “Select Source Connection” which indeed shows the connections created in your Datasphere space.
- Here, I am going to connect to a HANA Cloud system from where I am going to consume tables. So, select the connection and continue.
- The next thing is you need to choose “Source container“. Container is like a root path of target file. (For example: In case of a database, it’s the database schema)
- Here, I am selecting the container so that it will show the list of tables within the system.
- The final thing in setting up the source system is we need to select the source objects from the path we provided in the previous step for that click on the “Add Source Objects” and choose the tables that you want and click on Next.
- In the next screen, select all the objects and “Add Selection.”
- Next, configure the target “Select Target Connection”.
- Make sure you should choose “SAP Datasphere” as your target when you want to replicate data from SAP HANA Cloud into it.
Note: You can also have the option to select other targets as well… it depends on the connections you created with in Datasphere space. If you have Standalone HANA on-premises, Standalone HANA cloud and HANA Data Lake connections then those things will also show in the target.
- Let’s change the name of the replicated tables in SAP Datasphere. If you have existing tables with similar structure, you can add those tables in place of auto generated tables.
- Click on Settings to set the replication behavior, as I mentioned earlier the replication flow also supports Delta extraction. Additionally, you can one more option called “Truncate” on enabling that it will delete the data in the target structure.
For more detail regarding this section please go through Help document: Load Type
- Provide the Technical/Business name of the Replication flow and save it.
- Select any of the row you can see the replication properties from there you can add some projections.
- Here, I want to provide some simple filters say I want to restrict JOBID and at the same time Job classification. How we can do that, here we go…
Select the JOBID section as Between and provide the low value and High value, once you are done with that click on Add expression. Now, Select Job Classification and provide some valid input and click on Add expression.
- There is one more option called Mapping where you can change the exiting mappings as well as the data types which system has proposed by default and also you can add new columns to the target table. That’s it, once you are done provide the name for the projection and click on OK.
- The projections which we added is listed in the replication flow…with that we completed the creation of replication flow. Let’s deploy it.
- We can see all the tables and Replication flow got deployed…
- Now run the Replication flow, you can see a “Run” button.
- With that a background job started running, you can check the detail of the background running job by clicking on the Data Integration monitor from tools section.
- Once the run is complete, it will Show a message in the integration monitor. Now, let’s take a look at the tables and see if we can spot some data.
That’s it, we did it. Thanks for your time to read this article on SAP Datasphere. Hopefully, this article has provided you a better understanding of one of the key features and how they can help businesses unleash the full potential of their data.
Hi Mastan,
Thanks for the wonderful blog explaining this cloud based replication technique. Does this mean end of DP Agent and Remote table based replication which also support Real-Time Replication?
Regards,
Deo
Hi Deo,
No, existing remote table replication using DP Agent will exists. This new replication tool is to move data from one source to one target (Target may not be Datasphere you can use another target as well; please check the supported targets in the blog). Additionally, we can transfer the cleansed data by applying some projections.
Thanks,
Mastan
Hi Mastan,
Good blog and nicely written.
Thanks,
Shailu.
Hi Shailendar ANUGU
Thank you for the feedback.
Best regards,
Mastan
Hello Mastan,
thanks for the good overview of the new replication flows.
i have one question as you run the job as initial load, how will the delta load be done.
are they pushed automatically to the Targets? or do we need to schedule the delta loads?
thanks & best regards
Younes
Hi Younes,
In order to push the delta loads the setting should be " Initial and Delta" (Please find the screenshot below). with the initial release the delta duration is fixed to 60 minutes that means the delta load is happening every 60 minutes and capturing the replicating changes from the source to the target.
Thanks,
Mastan
Hello Mastan,
very helpful blog... but one question: How do you mean that?
SAP Basically bring in a new cloud-based replication tool. This cloud-based data replication tool is designed to simplify data integration processes by eliminating the need for additional on-premises components
When you try to set up a connection to an ABAP system, you still need an additional on-prem component: The Cloud connector?!
Thanks, Martin
Hi Martin,
Thanks for the feedback.
Coming to the Question, when it comes to cloud-to-cloud use cases, if you want to replicate data from a cloud-based source (such as S4HANA Cloud) to a cloud-based target (like Datasphere), installation of On-premises components is not needed. And rather, a direct connectivity is being used and when it comes to on-premises scenarios, for example, using SAP BW, using SAP Business Suite or S4HANA on-premises we will use cloud connector. But for pure cloud-based replications, this is not needed.
Best regards,
Mastan
Hello Mastan, Thanks for this informative blog. We have used replication flows in order to fulfill combined requirements which could not be realized with remote table replication resp. data flows. The delta was determiend by a standard CDC delta CDS view and we could confirm an almost optimal delta determination and forward to Datsphere. What we could not observe is an entry in the ODQ of the underlyng S4 System, so it seem that the delta reading must happen outside of ODQ. Could you shed some light into this part. That would be very interesting. Kind regards, Philipp
Hi Philipp,
Thanks for the feedback,
Can you please check out the transaction: DHCDCMON which is used to monitor the replication of CDS views via CDC engine. It provides information on the status of the replication process, including whether it is running, completed, or failed. It also provides detailed information on any errors that may have occurred during the replication process.
Best regards,
Mastan
Hello Mastan,
very helpful hint about the transaction DHCDCMON ... do you know the same (App?) for S4HANA Cloud public?
I created a Replication Flow to extract from I_CUSTOMER and it "failed with error"... but in the log the runtime is still updated on every refresh. Still after 30minutes no real error message and no abort and no error log.
I will let it run over the weekend and check on Monday again... but the respective App for the same t-code would be very helpful. I could not find any in the Apps library.
Thanks, Martin
Hello Mastan, thanks for the informative blog.
I hve the understanding, that in replication flows CDS views as well as SAPI-Extractors can be uses. We established a connection to a S4 onPremise System. When we create a replication flow for this connection, we only see the CDS container and no container für SAPI-extractors. At least one ODP released DataSource (0CUSTOMER_TEXT) is active available in the S4-system.
Do we have to do additional customizing in order to use SAPI-extractors?
Best regard,
Stefan