Skip to Content
Technical Articles

Modern SAP ERP Data Integration using Apache Kafka

The classic way of SAP ERP Data Integration is to use one of the many APIs available, reading the required data and putting it somewhere else. There are many APIs and different tools for that, all with their own pros and cons.

The S/4HanaConnector provided by rtdi.io goes a different route. Its aim is to rather utilize technologies built for Big Data to solve the problem in a flexible, easy, less intrusive and convenient way.

Step 1: Establish the connection

Given the fact that all ABAP tables are transparent tables in S/4Hana, the connection is made on database level for performance reasons.

 

Step 2: Defining the ERP Business Object

As a consumer I would like to get Sales Orders, Business Partners, Material Master and the such. Therefore the first step is to define the scope of above objects and where the data comes from. This can be achieved via multiple ways:

  1. Using the ABAP CDS Views, they join all tables belonging together.
  2. Using predefined Business Objects.
  3. Using the ABAP data dictionary to define the object scope.

For option #3, the most complicated one, the connector provides a UI to define such Business Object Entities.

It allows to browse through all the ERP tables, here VBAK containing the Sales Order header data, and to drop that on the output pane. A sales order consists of sales items as well, hence the VBAK table is expanded and shows all relationships to child tables. Dropping the VBAP on the output side adds it as a child and the relationship as defined by SAP is added automatically.

Finally the entire Business Object gets a name “SAP_Sales_Order” and is saved.

With that a Json file with the structure definition of the Sales Order Object got created.

 

Step 3: Configure the Producer

All that is left is assigning above Business Object to a producer.

 

Result

From now on all S/4Hana changes are sent to Apache Kafka and available for further consumption. Every single field of VBAK and VBAP as one object.

 

 

Simple, isn’t it?

8 Comments
You must be Logged on to comment or reply to a post.
  • Thank you for the article.

    Is this also valid for SAP ECC or is it exclusive for S4? So, in other words, is it also compatible with Oracle databases or is it just for Hana ones?

    Regards,

    Natalia

    • These concrete screenshot show the S/4Hana connector. My next step is to build the R/3 version of it. The majority of the work was in the framework, building an Oracle version is a matter of a few weeks.

      It depends a bit on what you require initially. In my project plan the development steps are:

      Phase 1: Transparent tables only.

      Phase 2: Add Pool/Cluster tables

      Phase 3: Support IDOCs as source

      Phase 4: Support ODP/ODQ (BW Extractors)

       

      Feel free to send me an email via the https://rtdi.io contact data in case you want to explore the options more.

       

        • Yes, but later. Loading data into ERP requires to use BAPIs.

          Above solution does support simple mappings already and if the structure and the BAPI does match, it is a simple task. But the pure structural mapping might not be as flexible as required for all cases.

          Hence my longer term plan is:

          • Consumer which can call BAPIs. If the mapping is doable, no problem.
          • For the important cases provide content, meaning the Business Entity definition for e.g. Sales Order plus a consumer for exactly that entity.
          • Use other tools. Any tool that can consume Kafka and call BAPIs can be used. The difficult part is to get the data and that I solve.

          Priority is a matter of demand. When I was with SAP I did all of that for Hana SDI, Data Services, BW,…

  • It’s an interesting approach, and I definitely like Kafka. But I assume you cannot get historical data via queries, since it’s an “on change event” mechanism.

    Also, if you’re not looking into a very high volume of data that needs to be processed real-time, I don’t see the value proposition to maintain a Kafka cluster.

    Also, an “on change” mechanism is already natively available on SAP Data Hub/Data Intelligence with SAP HANA, with no need for Kafka in between.

    • Let me try to answer your questions from the Kafka side:

      ad 1) Whenever a new system is connected you have to have some sort of an initial load. No difference with Kafka.

      ad 2) For a production ready solution proposed here, with the typical volumes an ERP system produces, a single 8GB RAM instance with 4 Cores is enough. The maintenance effort is marginal. If you compare it with the minimum requirements of Data Hub and its costs…

      ad 3) Sure, there are many options. Data Hub, Data Services, SDI, SLT just to name the ones in the SAP sales bag. But keep in mind, the value proposition of Kafka is to a) assemble a business entity and not replicate the individual tables like all named SAP solutions do and b) Kafka is used when there are multiple consumers of the same data. For a point-to-point connection there is no need to add an intermediate distribution mechanism like Kafka is.

       

      Agreed?

      • It depends on the Kafka configuration for message retention – if you choose to retain all messages, which is what you seem to be suggesting, a new subscriber will receive every change there ever was and, for performance reasons, that might not be good.

        In the end this is still a solution to listen for changes, and, although your solution is certainly complimentary, the API approach is still best for a most complete integration.

        I like the technology, but still fail to materialize a concrete use case for a customer who already own one of those SAP solutions.

        • Fair point.

          Just one tiny correction: What a subscriber consumes is up to the subscriber. Most subscribers, when loading a target system, will need all data. But it is their own decision to start with an higher offset/timestamp.

           

          Let’s start with a common ground. You are saying you prefer the API approach. What API? How is it called? There are multiple options ranging from BAPIs to APIHub, CDS DeltaAnnotations, CPI, SCP Integration Packages,… pick one.