This blog post demonstrates, how to improve the performance of DI pipeline by grouping multiple messages into a batch rather than send a single message at a time.
I had a requirement to expose some ECC data to enterprise Kafka topic using SAP Data Intelligence. I developed a custom ABAP operator to read data from ABAP function module and pass it to Kafka in JSON format.
Version of DI – SAP Data Intelligence 3.0
My initial build: Single Json message out
In this build, I had a function module (FM) to export a table of JSON messages and record count. Custom ABAP operator did a loop on the table and sent one message at a time to DI.
Main components of pipeline are
- Custom ABAP operator to read data from FM
- Kafka Producer
This design worked correctly and messages are posted in kafka as expected. However, throughput was very low at about 20 msg/sec.
Our bottleneck was communication between ABAP and DI layer – Abap code ended in seconds, but it took 24 minutes to see full payload of 24K messages in DI layer.
How I improved the performance: Batch multiple JSON messages into Stream output. This enables DI to consume more data with each read
DI interprets series of strings separated by NEWLINE (\n) as stream and has operators to convert ‘String to Stream’ and from ‘Stream to String’. I did the following changes to extract a stream of data from the ABAP layer.
- Update ABAP function module to concatenate messages into one long string separated by ‘\n’. So now the function module exports a long string instead of table of messages. Take care to limit your string length if you have large payload. I limited mine to 1M chars.
- Pass this output into the ‘StringToStream’ operator, so DI sees it as a stream.
- Then connect it to ‘StreamToString’ to split the output into multiple messages for Kafka
Now, ABAP operator in DI does not have to read multiple strings from backend, but only one long output. Once data is in the DI layer, pipeline completed very fast improving throughput by 1400% from 20 msg/sec to 300 msg/sec.
Main components in this pipeline are:
- Custom ABAP operator to reads data from FM
- ‘StringToStream’ Operator
- ‘StreamToString’ Operator
- Kafka Producer
Hope this helps to tune your pipelines. I would love to hear if you have used other techniques to improve your pipeline performance.