SAP Data-Hub 2.3: “Hello Kafka-World”
In 2018 I’ve frequently witnessed a new kid in the Enterprise IT town: Apache Kafka
From my personal statistic the frequency of customer architectures considering Kafka as a new architecture component is similar to the appearance of Hadoop-based Data-Lakes in advanced analytics architectures four or five years ago.
Let’s try some hands-on exercises connecting SAP ECC with Apache Kafka using SAP Data-Hub!
?Scenario: Replicate changes on the product master data from a SAP ECC to a Kafka broker
Based on conversations with my customers Kafka could be used as message broker for asynchronous communication between SAP backends and microservices.
One scenario might be updating microservices-based web-applications with the latest changes to product master data maintained in a SAP ECC.
For consuming data from a SAP ECC typically these prerequisites have to be met:
- Usage of an API
- Semantics (Instead of plain tables)
- Identification of changes to the data, e.g. Change Data Capture (CDC)
- Real time replication/streaming capabilities.
In this scenario SAP Operational Data Provisioning (ODP) was chosen as API to meet the prerequisites and to build this basic demo.
SAP Operational Data Provisioning (ODP)
“Operational Data Provisioning supports extraction and replication scenarios for various target applications and supports delta mechanisms in these scenarios. In case of a delta procedure, the data from a source (the so called ODP Provider) is automatically written to a delta queue (the Operational Delta Queue – ODQ) using an update process or passed to the delta queue using an extractor interface. The target applications (referred to as ODQ ‘subscribers’ or more generally “ODP Consumers”) retrieve the data from the delta queue and continue processing the data.”
(Source: Operational Data Provisioning (ODP) FAQ)
Picture: S-API and ABAP CDS based Extraction
? Implementation: SAP Data-Hub Pipeline
To implement the scenario (ODP->Kafka) a SAP Data-Hub Pipeline was created.
- SAP Hub 2.3
- Currently SAP-internal only: SAP Note: 2685158 – SAP Data Hub Enabling data ingestion from BUSINESS_SUITE connection
- ODP enabled extractors (2232584 – Release of SAP extractors for ODP replication (ODP SAPI)
- Kafka broker (Not part of the SAP Data-Hub)
Step 1: Create ECC connection in SAP Data-Hub Connection Management
The first step is the creation of a SAP ECC connection in the SAP Data-Hub Connection Management.
Important Parameter in connection configuration:
- Empty = S-API based extractors will be used.
- ABAP-CDS = ABAP CDS View based extractors will be used (Delta Extraction and Real Time Streaming ODP-CDS to BW –> Applicable to Data-Hub as well
Picture: SAP Data Hub Connection Management
Step 2: Browse SAP ECC ODP sources in SAP Data-Hub Metadata Explorer
The SAP Data Hub Metadata Explorer enables Data-Engineers to work with multiple datasources like:
- SAP ECC
- SAP BW
- Azure Data Lake (ADL)
In practise a data engineer or data scientist is enabled browse and understand SAP data-sources without the need for having deep SAP background knowledge or the prerequisite to install dedicated SAP frontends. Vice versa a SAP-user is enabled to conveniently browse data-sources like HDFS or ADL.
Browse SAP ODP data sources on an SAP ECC:
Picture: SAP Data Hub Metadata Explorer – ECC Connection
Preview result of a SAP ODP extractor (EPM Demo: Product):
Picture: SAP Data Hub Metadata Explorer – Data Preview
After finding and previewing the appropriate data-source, the data engineer will now start to build the SAP Data Hub Pipeline.
Step 3: Building the Data-Hub Pipelines
The Pipeline for writing from ODP to Kafka consists of the following main components:
- Workflow trigger / terminator
- Flowagent CSV Producer
- SAP BusinessSuite ODP Object Consumer
- Wiretab (Console-Out)
- Kafka Producer
Picture: Data-Hub Pipeline ODP2Kafka
The data from the ODP extractor is stored in Kafka with the topic “FROM_ERP”
Picture: Kafka Producer Topic “FROM_ERP“
For demo purposes a second pipeline was built as well. The second pipeline will be used use to read data from the Kafka broker and to display it in the Wiretap console-out
- Kafka Consumer (Topic: FROM_ERP)
- Wiretap to display result
Picture: Data-Hub Pipeline for reading Apache Kafka Broker
After successful implementation the Data-Hub pipelines will be started and the ECC data will be written to the broker in the first pipeline.
The second pipeline will fetch data from the broker and displayed it in the Wiretap as text representation:
Browsing the ODP extractors and previewing the results as tables is a very convenient feature provided by the SAP Data-Hub 2.3 Metadata Explorer.
The Data-Hub Pipelines were literally easy to Drag&Drop for this first demo scenario.
Due to the flexibility of the Data-Hub Pipeline Operators there are other countless options to combine ODP or Kafka with other data-sources or -sink:
- ODP to Cloud-based Object Storages
- ODP to Vora or HDFS
- Kafka to Data-Lakes
- SAP HANA to Kafka
- Data-Lakes to SAP HANA
? Demo video
? Pipeline JSON-source
Many thanks for reading this blog till the end and to Onno Bagijn. ?