SAP Integration with Azure Data Factory
The SAP Datawarehouse (SAP DW) and analytics tools are best suited for SAP source system, and we always encourage everyone to go with SAP DW solution and use SAP analytics tools if the source system is SAP. SAP DW and analytics tools are tightly integrated and has been there for decades. But the customers are shifting towards Azure, AWS, Google cloud for data warehousing and analytics along with SAP DW in parallel.
Currently SAP Data Services is used as the ETL tool to pull the data from SAP using the BW extractors and push it to the Azure blob storage. We were looking at moving the data from SAP to Azure without using SAP DS due to issues like file upload failure in blob or delay in the loads or Data reconciliation etc. We looked at the various Azure connectors that were available during the early 2022 but they did not satisfy the requirement of using the BW extractors, especially the delta functionality of the extractors. SAP comes with standard 7500+ extractors, 2500+ CDS views which are used for data extraction and we wanted to use the SAP delivered content along with the delta functionality.
Azure Data Factory (ADF) is an ETL and ELT data integration platform as a service (PaaS). It is a single tool to enable data ingestion from SAP as well as other various sources, and data transformation via built-in Data Flow, integration with Databricks/HDInsight/etc. In this blog we will focus on SAP as source system and using ADF to load the data.
Azure had below 6 Connectors available to connect to SAP systems:
These 6 connectors work well for the full extracts of data. Incremental loads are also possible where we have date, timestamp, or incremental columns (like ID) available in the source table/objects. Additional configuration and development were needed to Implement the incremental loads using these available connectors. The connectors could not use the extractor or ABAP CDS views built in delta functionality. You can check the detail on these connectors using below link.
We all know SAP BW extractors have been available for ages and are the main source for moving data from SAP source system to target system. With S4HANA coming in we have the new ABAP CDS views available which can be used to extract the data to target systems. The main advantage of using the BW extractors, ABAP CDS Views, SLT is to utilize the delta functionality or the change data capture functionality that helps in getting the new/updated/deleted records from SAP source system.
So, the Problem statement was the Connectors were not able to connect to the SAP BW extractors or the ABAP CDS views.
Azure has come up with the new connector “SAP CDC Connector” to handle the delta/Change data capture functionalities of SAP BW extractors and ABAP CDS views. It also supports extraction using SLT 😊
SAP CDC connector:
The SAP CDC solution in Azure Data Factory is a connector between SAP and Azure. On the SAP end you have the SAP ODP connector that invokes the ODP API over standard Remote Function Call (RFC) modules to extract full and delta raw SAP data.
The Azure end includes the Data Factory mapping data flow that can transform and load the SAP data into any data sink supported by mapping data flows.
The SAP CDC connector uses the SAP ODP framework to extract various data source types, including:
- SAP extractors
- ABAP CDS views with S4HANA
- InfoProviders and InfoObjects datasets in SAP BW and SAP BW/4HANA
- SAP application tables (Using the SLT)
In this process, the SAP data sources are providers. The providers run on SAP systems to produce either full or incremental data in an operational delta queue (ODQ). The Data Factory mapping data flow source is a subscriber of the ODQ.
Support all ODP sources:
- 7500+ Datasource/Extractors delivered with SAP ECC
- 2500+ ABAP CDS Views delivered with SAP S/4HANA
- Custom built Extractors, CDS Views and BW Objects
- Near real time replication with SAP SLT server
It supports the Full and delta extractors, extraction from tables (Full and incremental via SLT), ABAP CDS views and Infoproviders in SAP BW and BW4HANA
- Be familiar with SAP concepts like ODP extractors, ODP framework, ABAP CDS views, Delta functionalities, Delta types and Data Factory concepts like integration runtimes, linked services, datasets, activities, data flows, pipelines, and triggers.
- Be familiar with monitoring data extractions on SAP systems.
- Setup SAP system to use the SAP ODP framework
- Set up a self-hosted integration runtime (SHIR) to use for the connector
- Set up an SAP CDC linked service
- Debug issues with the SAP CDC connector by sending self-hosted integration runtime logs to Microsoft.
Note – It is recommended to implement the delta lake solution in Azure for analytics.
Data extractions via ODP require a properly configured user on SAP systems. The user must be authorized for ODP API invocations over Remote Function Call (RFC) modules. Please check the below notes
For SAP system requirement, Setup of the SAP datasource and SLT refer the below link:
I have scanned through various documents, blogs, forums and I will summaries my understanding of what’s possible when using CDC connectors:
- Able to extract data from ODP enabled extractors (Both standard and generic) from SAP ECC, SAP BW and S4/HANA.
- Data can be fetched in full and delta for extractors
- CDS vies based extraction is possible
- Extraction from all table type (Transparent, Pooled, Cluster Table) and views. It is recommended to use STL for table extraction.
- ADF handles SAP ECC, SAP BW or S4HANA ABAP CDS view based extractors in the same way
- 3 Run mode available while extracting the data
- Full on every run
- Full on first run then incremental
- Incremental changes only
- No limitation on the frequency of the loads. Make sure to schedule the loads such that the last run does not overlap with existing one.
- Filters are possible in the full load extraction. Create partition on the required field.
- Built in capability in Azure to handle the load failure
- Delta lake functionality is supported, and it is the preferred to use while using CDC connector
- Use SLT or continuous mode (to be delivered in future) for near real time data
- If the job is stopped/cancelled in ADF it won’t be stopped/cancelled in SAP. It must be manually done in SAP.
- No checks or activities needed in ADF for any patch upgrade in SAP
- You need to log in SAP to check the ODQMON entries
- Hierarchy datasource are not supported yet.
- Not possible to give filters on the delta loads i.e., multiple init with different selection
- Extractor enhancement handling using the CDC connector
- Additive extractors are not supported for now with Delta lake solution
- Continuous mode will be coming in early 2023 where the load should start once the current load has completed
I hope the blog helped you in understanding what CDC connector is and how it can be used to extract data using SAP Extractors, CDS views and SLT.
I will be publishing a follow up blog when I complete my first POC of using CDC connectors. I will cover the points discussed in my understanding and open points in the next blog by taking some examples and validate the same.
Stay tuned……………… 😊