Skip to Content
Technical Articles

S/4Hana to Kafka

More and more customers use Apache Kafka as their real-time data backbone. SAP Hana contains the current state of the business and Apache Kafka the entire stream of changes since the beginning. This enables all data consumers to get ERP data without accessing the expensive S/4Hana system, thus is a great cost saving measure and open new possibilities.

Because Kafka is so poplar, most tools support Kafka, e.g. SAP Data Intelligence, all Big Data tools, ETL tools, pretty much everything.

What customers are missing is an easy way to get S/4Hana data into Kafka, though and the S/4HanaConnector for Kafka helps here (see github and docker).

The usage of the S/4HanaConnector is very simple:

  1. Pull it from Docker Hub
  2. Open the Admin UI and create connections to the S/4Hana system and Kafka
  3. Select the objects to produce data for
  4. Assign the objects to one or multiple producer instances

Everything else happens under the cover. The Kafka schema definitions are derived from the SAP structures, at first start an initial load is performed and from then on the data is produced with a latency of seconds.

With this connector it is a matter of minutes to get data into Kafka.

 

Under the covers there is much more going on, everything needed for a complete Data-, System- and Process Integration solution. Metadata about the landscape, impact/lineage information, ability to map data when reading – e.g. to rename the columns – and to adjust it to an existing schema instead of using a source specific schema.

This post is part of a series and its full power is unlocked when combining it with one of the other components.

 

/
5 Comments
You must be Logged on to comment or reply to a post.
  • Hi Werner,

    looking at the screenshot you provide and in the README.md on GitHub I see that you’re connecting to the SAP HANA Database below the SAP S/4HANA System. Wouldn’t it make more sense to use the SAP S/4HANA or SAP S/4HANA Cloud Business Events? Based on this Events the corresponding API’s can be used to read the Business Objects with all details. In your approach the relationship between VBAK and VBAP must be defined manually where in the API_SALES_ORDER_SRV there is a navigation attribute defined in the OData Service.

    Best regards
    Gregor

  • Gregor Wolf I would argue, every method has its pros and cons. Therefore I would like to provide adapters for all methods:

    • Business Events
    • IDOCs
    • BAPIs
    • CDS with “delta.changeDataCapture: automatic” and ODP

    The reason I started with a low level implementation is because it provides access to 100% of the data with the least amount of work. But it is definitely just one of many options.

    To be more precise, the Business Events have the following downsides in my opinion:

    1. Initial Load
    2. Performance impact on the ERP system
    3. Only a subset of ERP data is provided via Business Events out of the box
    4. Adding customizing fields
    5. Not all fields exposed

     

    Regarding the navigation attribute, my goal is to do that in Kafka. Import the CDS definition and then assemble the nested object there. I call that service the Object Assembly. Same result and relieving the ERP system from this expensive work.

     

    Makes sense?

    • Not getting your point. Yes, Kafka Connect requires a timestamp field and you must guarantee there are no deletes. Therefore Kafka Connect cannot be used.

      This solution here does not need a timestamp to identify changes and it produces CDC data including the time the change was created, the transaction id and more.