S/4Hana to Kafka
More and more customers use Apache Kafka as their real-time data backbone. SAP Hana contains the current state of the business and Apache Kafka the entire stream of changes since the beginning. This enables all data consumers to get ERP data without accessing the expensive S/4Hana system, thus is a great cost saving measure and open new possibilities.
Because Kafka is so poplar, most tools support Kafka, e.g. SAP Data Intelligence, all Big Data tools, ETL tools, pretty much everything.
What customers are missing is an easy way to get S/4Hana data into Kafka, though and the S/4HanaConnector for Kafka helps here (see github and docker).
The usage of the S/4HanaConnector is very simple:
- Pull it from Docker Hub
- Open the Admin UI and create connections to the S/4Hana system and Kafka
- Select the objects to produce data for
- Assign the objects to one or multiple producer instances
Everything else happens under the cover. The Kafka schema definitions are derived from the SAP structures, at first start an initial load is performed and from then on the data is produced with a latency of seconds.
With this connector it is a matter of minutes to get data into Kafka.
Under the covers there is much more going on, everything needed for a complete Data-, System- and Process Integration solution. Metadata about the landscape, impact/lineage information, ability to map data when reading – e.g. to rename the columns – and to adjust it to an existing schema instead of using a source specific schema.
This post is part of a series and its full power is unlocked when combining it with one of the other components.
looking at the screenshot you provide and in the README.md on GitHub I see that you're connecting to the SAP HANA Database below the SAP S/4HANA System. Wouldn't it make more sense to use the SAP S/4HANA or SAP S/4HANA Cloud Business Events? Based on this Events the corresponding API's can be used to read the Business Objects with all details. In your approach the relationship between VBAK and VBAP must be defined manually where in the API_SALES_ORDER_SRV there is a navigation attribute defined in the OData Service.
Gregor Wolf I would argue, every method has its pros and cons. Therefore I would like to provide adapters for all methods:
The reason I started with a low level implementation is because it provides access to 100% of the data with the least amount of work. But it is definitely just one of many options.
To be more precise, the Business Events have the following downsides in my opinion:
Regarding the navigation attribute, my goal is to do that in Kafka. Import the CDS definition and then assemble the nested object there. I call that service the Object Assembly. Same result and relieving the ERP system from this expensive work.
I like the approach using the CDS definitions to re-assemble the object.
we have tried this it works well only question on how do we implement cdc kafka expects a time stamp field .
Not getting your point. Yes, Kafka Connect requires a timestamp field and you must guarantee there are no deletes. Therefore Kafka Connect cannot be used.
This solution here does not need a timestamp to identify changes and it produces CDC data including the time the change was created, the transaction id and more.
The connector is open source or licnesed?
Wenjie He Both, pay-per-use. See github.
hi expert，what the charge method about the connector？
Please send me an email.
I am very much interested in this. Where did you code this?. I working in S/4 Data Migration. time its difficult to do field mapping from legacy to S4. I would like this type of field mapping for its paining to do manual mapping. We are using SAP Data Services for data load.
Thanks and Regards
I config 'connect to kafka',but get an error:Topic cannot be read.
I do not config SASL in kafka,so I set the left form to null.
How to treat ?
Let's discuss that via github issues, yes?