Extractors – Source-Based CDC in SAP Data Services
In this blog I am going to discuss about the implement the Sourced based CDC extractors.
Source-based CDC implementations extract only the changed rows from the source. It is sometimes called incremental extraction. This method is preferred because it improves performance by extracting the least number of rows. When using Business Content Extractors, you have to check to see if the extractor you are using has delta recognition capabilities. Once you’ve imported the extractor you want to use, double click on it and then go to the Attributes tab. The Delta_Enable attribute tells you whether or not the extractor will capture changes.
When you use an extractor that has delta recognition capabilities, it has the following two fields.
- DI_SEQUENCE_NUMBER – Starts with zero at the beginning of each extraction. Its value is incremented by one each time a row is read.
- DI_OPERATION_TYPE – Data Services generates one of the following valid values in this column
- I for INSERT
- B for before-image of an UPDATE
- U for after-image of an UPDATE
- D for DELETE
These two columns are used by the Map_CDC_Operation transform as follows…
- DI_SEQUENCE_NUMBER – is used to determine the order in which the rows are processed. This is very important since there can be many updates to a single row and the updates on the target need to be applied in the same order as they were applied in the source.
- DI_OPERATION_TYPE – is used to determine how to process the rows, as described above.
##steps to perform a simple source-based CDC extraction for update tables with an extractor :-
- Create an SAP Application Datastore for your source system.
- Import the Extractor.
- Create/import metadata in HANA.
- Create a Datastore for HANA.
- Import the target table(s).
- Create a Batch Job.
- Create a Dataflow and add it to the Batch Job.
- Add the Extractor to the Dataflow.
- Open the Extractor
- Make sure the Initial load drop down list box has the value No
- Add a Query transform to the Dataflow and connect it to the Extractor.
- Add the target table to the Dataflow and connect it to the Query transform.
- Open the Query transform and map the fields you want to retrieve.
- Open the Map_CDC_Operation and configure the CDC columns .
- Add the target table to the Dataflow and connect it to the Map_CDC_Operation.
The resulting Workflow should look like this…
In the above way we can create Source-Based CDC which is very useful while implementing real time scenario in BODS project with SAP HANA.
in the next blog I am going to discuss about the Target-Based CDC Extractors details along with implementation steps.