Skip to Content
Technical Articles
Author's profile photo Adem Güler

Delta loading in SAP Datasphere based on the ABAP CDS view

In applications where there is a high volume of data creation, changes, and deletion happens, such as sales orders, it is crucial for the application to have a mechanism that provides delta records and for the extractor that needs to provide delta loading. This means that the application should be able to identify and extract only the new, modified or deleted data, ensuring efficient and accurate data loading.

In some of the cases nightly full uploads are not what we exactly want, as time windows for data extractions are limited. Solution for that is coming with ODP framework.

The ODP framework for CDS extraction provides delta capabilities. We have actually two options for delta handling that we can chose from:

  • Change Data Capture (CDC) Delta
  • Generic Timestamp / Date based Delta

Let’s check the two delta mechanisms in detail below.

 

Change Data Capture (CDC) Delta

The easiest way to implement delta loading is to use CDC-enabled CDS views by importing them as Remote tables in Datasphere and enabling Real-time access to them.

Let us see how it is done:

@AbapCatalog.sqlViewName: 'ZSQL_SALES_DELTA'
@AbapCatalog.compiler.compareFilter: true
@AbapCatalog.preserveKey: true
@AccessControl.authorizationCheck: #NOT_REQUIRED
@EndUserText.label: 'CDS View - Delta loading on Sales Document'

@Analytics.dataExtraction.enabled: true
@Analytics.dataCategory: #FACT
@Analytics.dataExtraction.delta.changeDataCapture.automatic: true
define view ZCDS_SALES_DELTA as select from vbap 
inner join vbak on vbap.mandt = vbak.mandt 
               and vbap.vbeln = vbak.vbeln
{
    
    key vbak.vbeln as SalesDocument,
    key vbap.posnr as SalesDocumentItem,
        vbap.matnr as MaterialNumber,    
        vbap.kwmeng as CumulativeOrderQuantity,
        vbap.vrkme as SalesUnit,
        vbap.netwr as NetValue,
        vbap.waerk as SalesDocumentCurrency,
        vbap.arktx as TextforSalesOrderItem
}

This is our CDS view for delta loading on Sales Documents for the purpose of an example.

If we want to declare a Dimension, Fact, Aggregation Level or Cube we must to include the classification of the CDS view in the header of our CDS view with the following annotation:

@Analytics.dataCategory: #VALUE

Replace #VALUE by one of the categories commented before:

  • #CUBE
  • #AGGREGATIONLEVEL
  • #DIMENSION
  • #FACT

A CDS view can be enabled for data extraction by just adding the following annotation:

@Analytics.dataExtraction.enabled: true

After adding this annotation, the CDS view is available for extraction by the ODP framework and visible for its data consumers.

The Change Data Capture (CDC) recording mechanism uses database triggers to record any changes to the tables that belong to an ABAP CDS view. For example CDS views such as CDS projection views, this can be done automatically by using the following annotation:

Analytics.dataExtraction.delta.changeDataCapture.automatic

Please refer link for more details: https://help.sap.com/docs/SAP_DATA_INTELLIGENCE/3a65df0ce7cd40d3a61225b7d3c86703/55b2a17f987744cba62903e97dd99aae.html

Adding%20the%20CDS%20view%20to%20the%20Datasphere%20as%20a%20remote%20table

Adding the CDS view to the Datasphere as a Remote table

Choosing%20the%20SQL%20view%20name%20of%20CDS%20view

Choosing the SQL view name of CDS view

Import%20and%20deploy%20as%20a%20remote%20table%20of%20CDS%20view

Import and Deploy as a Remote table of CDS view

Deployment%20has%20been%20done%20as%20a%20remote%20table%20in%20Datasphere

Deployment has been done as a Remote table in Datasphere

To be able to load initial data to the remote table we have to run first Snapshot for it.

Running%20snapshot

Running Snapshot

Let’s take a closer look at some of the data that we initially loaded first, because after using the Real-Time Access feature, we will see that the data that changes in the source system will also change in the Datasphere without reloading or scheduling the data.

The%20data%20we%20will%20look%20at%20as%20an%20example

The data we are looking at as an example

We will see that the description in the “Short text for sales order item” field of the Sales Documents changes in the source system and is updated in Datasphere using the Real-Time Access feature.

In order to load the changes (delta records) created in the source system into Datasphere, we need to enable the data replication feature for the remote table as “Real-Time Access“.

Enabling%20the%20Real-Time%20Access%20feature

Enabling the Real-Time Access feature

Enabled the Real-Time Access feature

When we look at the details from the Data Integration Monitor menu, it should be as follows:

Data%20Integration%20Monitor

Data Integration Monitor – general

Data%20Integration%20Monitor%20-%20details

Data Integration Monitor – details

  • Switching replication type for remote table ‘ZSQL_SALES_DELTA’ from batch replication to real-time replication.
  • Subscribed to source system data changes.
  • Successful initial data transfer for real-time replication of remote table ‘ZSQL_SALES_DELTA’. Now real-time replication will automatically add delta data to remote table ‘ZSQL_SALES_DELTA’.

Now let’s change the “Short text for sales order item” field description of the Sales Documents in the source system and see how it is reflected in the Datasphere.

In%20the%20source%20system%20we%20change%20the%20data%20we%20specified%20above

In the source system we change the data we specified as an example above

The%20data%28s%29%20updated%20in%20the%20source%20system%20are%20as%20follows

All data updated in the source system are as above

The refresh frequency cannot be adjusted but my observation is it takes place within 1 hour to get updated records in Datasphere.

The%20updated%20data%20that%20we%20had%20to%20look%20at%20as%20an%20example

All data updates that we have as an example

 

Generic Timestamp / Date based Delta

In order to be able to load data on a date-based basis, it is necessary to have date/time information in the relevant application tables that is updated based on changes in the application data.

“upd_tmstmp” is the field in the VBAK table which will trigger delta records in our example.

To define which field will trigger delta, we need to use annotation

@Analytics.dataExtraction.delta.byElement.name

This element can be date (ABAP type DATS) or a time stamp (UTC).

Lets update CDS view code with delta element annotation.

Updates%20CDS%20view

Updated CDS view

Please be aware of that: When converting a time stamp to a date, we can use the tstmp_to_dats function as it up or we can use the cast keyword as below.

cast(substring(cast(upd_tmstmp as abap.char(32)),1,8) as abap.dats) as LastChangeDate

Both coding structures will give the same result.

After we have fulfilled our need, we will create a Data Flow and limiting LastChangeDate = CurrentDate-1 at Projection node in Datasphere.

This%20is%20our%20Data%20Flow

This is our Data Flow – general view

The operation performed at the TO_DATE node: Since the “LastChangeDate” field we created in the CDS view is of string type, need to be converted it to date type here in node.

TO_DATE%20node

TO_DATE node

The operation performed at the FILTER node: The code applied to obtain the changes (delta records) from a day ago.

FILTER%20node

FILTER node

There are two methods we can apply to run the Data Flow we created. They are:

  • The first one is: to run directly the Data Flow by pressing the run button manually or
  • The second one is: to schedule the Data Flow to run daily basis at specified time automatically.

In both of methods, all data that changed (delta records) the day before in the source system will be uploaded to the Datasphere.

As we have chosen to use the second method in our example, the delta records obtained as a result of running this data flow in the target table node one day later are shown below.

The%20result%20of%20Scheduled%20Data%20Flow%20in%20Datasphere

The result of changes (delta records) in Datasphere

The%20result%20of%20Scheduled%20Data%20Flow%20in%20Datasphere

The result of Scheduled Data Flow in Datasphere

 

Best Regards.

 

Assigned Tags

      17 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Julian Juraske
      Julian Juraske

      Hello Adem,

      thanks for the great summary!

      Do we not transfere the whole datatable in the second option from sourcesystem to targetsystem ?
      Shouldn't the purpose of Delta be, to transfere only the required data recors?

      Is there a way to bypass the Value (Changedate) as a Filter to the Source ? (e.g. in a BW Extractor)
      Otherwise I could also just drop the Table and load full in the Dataflow, or implement my own Deltalogic within Datasphere(using SQL View) or use the Append Option in combination with UPSERT.

      Best Regards
      Julian

      Author's profile photo Adem Güler
      Adem Güler
      Blog Post Author

      Hello Julian,

      For the first question:
      I did not see the need to mention this separately in order not to make the blog too long.
      The purpose of using delta logic is to exclusively retrieve newly created, modified, and deleted records from the source system. This allows for a more efficient and streamlined process of transferring data to the target system.

      For second question:
      I do not know. While the first two options are viable, I haven't tested the logic for setting it as UPSERT. However, it's important to note that when a primary key is present in our target table, we can employ that option to avoid any potential duplicate key errors.

      Best Regards.

      Author's profile photo Julian Juraske
      Julian Juraske

      Hello Adem,

      In your first Option (CDC) you only Transfer the changed Records (after Initial Full Load) = Delta.

      In the second Option (DF) you ALWAYS(not only Initial) transfere the whole Data from Source to Target.
      Basicly what you are doing is a Full Load from Source to Target, and Filter out the required Records in the Targetsystem (by setting a Change Timestamp in Source).
      The second Option (filter in Target System) can be achieved by many different aproaches (e.g. SQL View comparing Data from today vs yesterday, or the APPEND UPSERT Mode)
      However for me this is not a real Delta, since you do not bypass a Filter to the Source to reduces the amound of Records transfered.

      Best Regards
      Julian

      Author's profile photo Benedict Venmani Felix
      Benedict Venmani Felix

      Hello Julian,

      For the second option,  isn't the DAY-1 filter kind of safety interval we use in FI extraction? I haven't tried this myself, but will the delta still not work as usual? If I have a years worth of data in source I presume the delta only picks DAY-1 records and not the entire data and then filter it in DS. I have to check.

      Author's profile photo Alberto Simeoni
      Alberto Simeoni

      the problem is that

      FIRST: all the rows are moved from the ERP to Datasphere.

      SECOND: the filter is applied.

       

      only if a filter is simple the filter is pushed down to ODP.

      just adding a cast after day - 1 and the filter is not valid for pushdown anymore.

       

      So the delta is basically useless because the real problem (data fetch from SAP application layer and data transfer) is not addressed

      Author's profile photo Adem Güler
      Adem Güler
      Blog Post Author

      Hello Julian,

      Your comment about the second option is wrong because "Data is not accessed locally in SAP Datasphere. It’s read from a virtual table, which accesses the data directly from the source system via the adaptors."

      For more details you have to read information on link: https://help.sap.com/docs/SAP_DATASPHERE/be5967d099974c69b77f4549425ca4c0/4dd95d7bff1f48b399c8b55dbdd34b9e.html

      What I am doing in the second option with data access "remote" is reading data from a virtual table and filtering out the day before records in the source system and loading the resulting captured records into the target system. As a nutshell, this is called a "delta loading".

      Best Regards.

      Author's profile photo Julian Juraske
      Julian Juraske

      Hello Ademn,

      I never said you persist the Data of the Source Table, but you are transfering the whole Data, so it's going through the Network and that takes time. (Basicly a Full Load)

      In a case where you have Millions of Datarecords this is not what you wanna do.

      Like Alberto is saying you filtering in the 2nd Step within the Dataflow.

      What would be intresting, if you could push down the Filter into Source, so the virtual Table only delivers the requrest Datarecords.

      Best Regards
      Julian

      Author's profile photo Martin Kreitlein
      Martin Kreitlein

      Hello Adem,

      thanks a lot for that Blog.

      What is missing here, for my understanding... how exactly is the Delta transfer happening technically?

      I mean, there are two possibilities,

      • the one you showed, where you need the Data Provisioning Agent
      • the other with Dataflows or Replication Flows, where you need the Cloud Connector.

      What exactly is happening if there is a network issue or the respective server is down (DPA or CC) before the next Data replication?

      Where is the information stored which records have been transferred successfully and which not?

      I tried to find the details about this in the Online Help, but I could not really find the answers.

      Thanks, Martin

      Author's profile photo Julian Juraske
      Julian Juraske

      Hello Martin,

      the CDC is using the ODP Framework.

      There was a good blog (from 2020)
      https://blogs.sap.com/2020/05/30/using-delta-extraction-capability-of-abap-cds-to-sap-data-warehouse-cloud/

      So I would assume you could repeat the Request send to Datasphere.

      Author's profile photo Martin Kreitlein
      Martin Kreitlein

      Hello Julian,

      When reading the blog, I would agree that this is true for Remote Tables.

      But I disagree for Replication Flows!

      In my project, we have 8 different CDS Views with CDC, all used in Replication Flows, and for none of them I can see entries in ODQMON.

      I see all the triggers in DHCDCMON... but there is nothing where you could "repeat" anything, only stop it completely.

      The only way to see a request in ODQMON is if I execute a Delta in RODPS_REPL_TEST ... but this is only for testing, not for the real replication to Datasphere.

      So my question regarding no 2) is still not answered...

      Thanks, Martin

      Author's profile photo Philipp Nell
      Philipp Nell

      Hi Julian, are you sure about the entry in ODQ for option one? The last time I checked, I didn't see it for DSP as a receiver. Cheers.

      Author's profile photo Armaan Singla
      Armaan Singla

      Hi Martin Kreitlein,

       

      I totally agree with you. To summarise this from my point of view, we have 2 options

      • Remote Table Replication which requires the Data Provisioning Agent.
      • Data flows or Replication Flows, which requires the Cloud Connector.

      In option 1 for delta replication, it supports both CDC based and ODP framework (timestamp based delta).

      For CDC based delta, it relies on triggers and can be checked in DHCDCMON.

      For ODP based delta. it can be checked in ODQMON. The problem with this option is that how to adjust the refresh frequency. As per the below SAP Note, it could 1h or even 1 day. 

      https://me.sap.com/notes/3155988/E

      Do you know of any option to set the scheduled frequency?

      In option 2 for delta replication, it supports only CDC based delta hence relies only on triggers and can be checked in DHCDCMON.

      Regards,

      Armaan

       

      Author's profile photo Ismael Cardoso
      Ismael Cardoso

      Hi Armaan,

      Answering your question #1, it's not possible to change/adjust frequency for real-time replications. The frequency is determined at runtime according to scenario's capabilities.

      Regards,
      Ismael

      Author's profile photo Armaan Singla
      Armaan Singla

      Hi Ismael Cardoso,

       

      Thanks for your response!

      The issue with this is that we expect the data to be replicated in real-time if we are enabling real-time replications. What are the factors influencing this frequency so that those factors can be adjusted to avoid any delay in data replication?

       

      Regards,

      Armaan

      Author's profile photo Ismael Cardoso
      Ismael Cardoso

      Hi Armaan,

      I'm not aware of all details since the heuristic relies on ODP's source object.
      A good rule of thumb could be:

      • 15s for real-time-capable ODPs
      • 1 day for delta-capable ODPs when heuristic expects updates only once per day (e.g. when delta/CDC mechanism is based on a date field (ABAP type DATS))
      • 1 h for any other delta-capable ODP

      Regards,
      Ismael

      Author's profile photo Aman Dwivedi
      Aman Dwivedi

      Hello Adem,

      Excellent blog.

      I tried replicating the same code/scenario but getting this error in Datasphere when enabling real time access

       

      Although, if I replicate any standard CDS extractor as a custom Z extractor, it works.

      Can you help what I must be missing on here.

       

       

      Thanks

       

       

      Author's profile photo Adem Güler
      Adem Güler
      Blog Post Author

      Hello Aman,

      I am going to suggest you information at link below. Please check the cases. https://help.sap.com/docs/SAP_DATASPHERE/be5967d099974c69b77f4549425ca4c0/441d327ead5c49d580d8600301735c83.html

       

      Best Regards.