Modern SAP ERP Data Integration using Apache Kafka
The classic way of SAP ERP Data Integration is to use one of the many APIs available, reading the required data and putting it somewhere else. There are many APIs and different tools for that, all with their own pros and cons.
The S/4HanaConnector provided by rtdi.io goes a different route. Its aim is to rather utilize technologies built for Big Data to solve the problem in a flexible, easy, less intrusive and convenient way.
Step 1: Establish the connection
Given the fact that all ABAP tables are transparent tables in S/4Hana, the connection is made on database level for performance reasons.
Step 2: Defining the ERP Business Object
As a consumer I would like to get Sales Orders, Business Partners, Material Master and the such. Therefore the first step is to define the scope of above objects and where the data comes from. This can be achieved via multiple ways:
- Using the ABAP CDS Views, they join all tables belonging together.
- Using predefined Business Objects.
- Using the ABAP data dictionary to define the object scope.
For option #3, the most complicated one, the connector provides a UI to define such Business Object Entities.
It allows to browse through all the ERP tables, here VBAK containing the Sales Order header data, and to drop that on the output pane. A sales order consists of sales items as well, hence the VBAK table is expanded and shows all relationships to child tables. Dropping the VBAP on the output side adds it as a child and the relationship as defined by SAP is added automatically.
Finally the entire Business Object gets a name “SAP_Sales_Order” and is saved.
With that a Json file with the structure definition of the Sales Order Object got created.
Step 3: Configure the Producer
All that is left is assigning above Business Object to a producer.
From now on all S/4Hana changes are sent to Apache Kafka and available for further consumption. Every single field of VBAK and VBAP as one object.
Simple, isn’t it?
Thank you for the article.
Is this also valid for SAP ECC or is it exclusive for S4? So, in other words, is it also compatible with Oracle databases or is it just for Hana ones?
These concrete screenshot show the S/4Hana connector. My next step is to build the R/3 version of it. The majority of the work was in the framework, building an Oracle version is a matter of a few weeks.
It depends a bit on what you require initially. In my project plan the development steps are:
Phase 1: Transparent tables only.
Phase 2: Add Pool/Cluster tables
Phase 3: Support IDOCs as source
Phase 4: Support ODP/ODQ (BW Extractors)
Feel free to send me an email via the https://rtdi.io contact data in case you want to explore the options more.
Thank you Werner. Good to know it.
And also SAP acting as consumer planned in your project for ECC?
Yes, but later. Loading data into ERP requires to use BAPIs.
Above solution does support simple mappings already and if the structure and the BAPI does match, it is a simple task. But the pure structural mapping might not be as flexible as required for all cases.
Hence my longer term plan is:
Priority is a matter of demand. When I was with SAP I did all of that for Hana SDI, Data Services, BW,...
You have mentioned you are working on replicating cluster tables as well in phase 2 , were you able to achieve the same using kafka? I am trying to replicate BSEG could this replication be possible?
Any guidance would really help
I have not expanded to R/3 yet, because frankly I don't get any good direction from SAP in that regards.
The final problem is that SAP has solved all integration challenges with the One Data Model approach, according to marketing at least. Would be interesting to tell them your requirements and what SAP suggests how to solve it.
Thanks for the detail above. Will try out the options.
It's an interesting approach, and I definitely like Kafka. But I assume you cannot get historical data via queries, since it's an "on change event" mechanism.
Also, if you're not looking into a very high volume of data that needs to be processed real-time, I don't see the value proposition to maintain a Kafka cluster.
Also, an "on change" mechanism is already natively available on SAP Data Hub/Data Intelligence with SAP HANA, with no need for Kafka in between.
Let me try to answer your questions from the Kafka side:
ad 1) Whenever a new system is connected you have to have some sort of an initial load. No difference with Kafka.
ad 2) For a production ready solution proposed here, with the typical volumes an ERP system produces, a single 8GB RAM instance with 4 Cores is enough. The maintenance effort is marginal. If you compare it with the minimum requirements of Data Hub and its costs...
ad 3) Sure, there are many options. Data Hub, Data Services, SDI, SLT just to name the ones in the SAP sales bag. But keep in mind, the value proposition of Kafka is to a) assemble a business entity and not replicate the individual tables like all named SAP solutions do and b) Kafka is used when there are multiple consumers of the same data. For a point-to-point connection there is no need to add an intermediate distribution mechanism like Kafka is.
It depends on the Kafka configuration for message retention - if you choose to retain all messages, which is what you seem to be suggesting, a new subscriber will receive every change there ever was and, for performance reasons, that might not be good.
In the end this is still a solution to listen for changes, and, although your solution is certainly complimentary, the API approach is still best for a most complete integration.
I like the technology, but still fail to materialize a concrete use case for a customer who already own one of those SAP solutions.
Just one tiny correction: What a subscriber consumes is up to the subscriber. Most subscribers, when loading a target system, will need all data. But it is their own decision to start with an higher offset/timestamp.
Let's start with a common ground. You are saying you prefer the API approach. What API? How is it called? There are multiple options ranging from BAPIs to APIHub, CDS DeltaAnnotations, CPI, SCP Integration Packages,... pick one.
I won't question the technology itself, but I would question the wisdom of dumping entire SAP objects into Kafka. This may lead to a tremendous amount of garbage in queues, with data that is useless to systems outside the SAP world.
And what is your suggestion, Joao Sousa ? Keep the SAP data in the Hana database? Don't use SAP data at all?
Personally I would actually argue the opposite. SAP data is extremely valuable. What is more valuable as the central nerve center of a business with orders, billings, profit center accounting. It is the opposite of garbage. Unlocking this information for other system and to allow e.g. cross references does increase the business benefit created by such data even further.
And most important, having the data in Kafka does not cost much. It costs virtually nothing actually compared with the costs of storage in a database.
When a customer can do a cross reference in tools like SAP BW which also readily provides streaming capabilities and now with ODP framework & its compression rates(~>80%) which I suppose can be consumed by different systems . Why would he want an overhead ? Isn't this like few big data folks asking to replicate base tables & creating a data swamp?
Werner Dähn Very interesting solution for a proper integration of HANA into Kafka. Would have expected this as a standard solution directly from SAP :-).
We are also heavily investigating into solutions for integrating SAP with Kafka. Our first connector solution implemented during a customer project was a "proxy based" solution extracting data out of BW Data Hubs into Kafka. The scope was a migration away from SAP BW to Kafka, Spark etc.
Ashish Tiwari I am working with SAP-BW since 10 years now and finding more differences between BW and Kafka than things they have in common. ODP is part of the SAP ABAP/NGAP source systems and can be consumed by other systems, so it is not a dedicated pro for BW. If a customer would ask me to build up a central nervous system for data with real time streaming features and an open API for data consumption using different systems and languages etc. I would never suggest a SAP BW solution. There are other use cases where BW is a better fit or maybe Kafka and BW in a common landscape.
When thinking about a connector covering most of the business integration scenarios with SAP and Kafka we decided to implement an ODP connector with Kafka Connect. It is able to do CDC data replication with direct connection to an ODP source system data source. Check it out at https://www.confluent.io/hub/init/kafka-connect-odp .
Even more systems can be integrated using our OData source and sink connector. They will be verified by confluent soon. Another connector can be used to push data from Kafka to BW using a Webservice Data Source, if you have the need to.
If interested, find more details at https://init-software.de/portfolio/solution/kafka-connectivity/ .