Skip to Content
Technical Articles

SAP DI – Performance tuning of DI pipeline by batching messages

This blog post demonstrates, how to improve the performance of DI pipeline by grouping multiple messages into a batch rather than send a single message at a time.

I had a requirement to expose some ECC data to enterprise Kafka topic using SAP Data Intelligence. I developed a custom ABAP operator to read data from ABAP function module and pass it to Kafka in JSON format.

Version of DI –  SAP Data Intelligence 3.0

My initial build: Single Json message out

In this build, I had a function module (FM) to export a table of JSON messages and record count. Custom ABAP operator did a loop on the table and sent one message at a time to DI.

Fig1%3A%20Build%20table%20of%20JSON%20messages%20in%20ABAP%20FM

Fig1: Build table of JSON messages in ABAP FM

 

Fig2%3A%20Custom%20operator%20code%20%u2013%20Loop%20through%20the%20table%20and%20send%20one%20message%20out%20at%20a%20time

Fig2: Custom operator code – Loop through the table and send one message out at a time

 

Main components of pipeline are

  • Custom ABAP operator to read data from FM
  • Kafka Producer

Fig%203%3A%20SAP%20DI%20graph%20with%20ABAP%20operator%20%28outport%20of%20type%20string%29%2C%20%A0Kafka%20producer.

Fig 3: SAP DI graph with ABAP operator (outport of type string),  Kafka producer.

 

Fig%204%3A%20output%20shown%20in%20wiretap.%20Each%20record%20from%20table%20is%20read%20as%20one%20message%20by%20DI.%20Shown%20here%20is%20three%20messages%20with%20three%20timestamps.

Fig 4: output shown in wiretap. Each record from the table is read as one message by DI. Shown here are three messages with three timestamps.

 

This design worked correctly and messages are posted in kafka as expected. However, throughput was very low at about 20 msg/sec.

Our bottleneck was communication between ABAP and DI layer – Abap code ended in seconds, but it took 24 minutes to see full payload of 24K messages in DI layer.

 

How I improved the performance: Batch multiple JSON messages into Stream output. This enables DI to consume more data with each read

DI interprets series of strings separated by NEWLINE (\n) as stream and has operators to convert ‘String to Stream’ and from ‘Stream to String’.  I did the following changes to extract a stream of data from the ABAP layer.

  • Update ABAP function module to concatenate messages into one long string separated by ‘\n’. So now the function module exports a long string instead of table of messages. Take care to limit your string length if you have large payload. I limited mine to 1M chars.
  • Pass this output into the ‘StringToStream’ operator, so DI sees it as a stream.
  • Then connect it to ‘StreamToString’ to split the output into multiple messages for Kafka

Now, ABAP operator in DI does not have to read multiple strings from backend, but only one long output. Once data is in the DI layer, pipeline completed very fast improving throughput by 1400% from 20 msg/sec to 300 msg/sec.

 

Fig5%3A%20Concatenate%20single%20messages%20into%20log%20string%20separated%20by%20newline

Fig5: Concatenate single messages into log string separated by newline

Fig6%3A%20Custom%20operator%20reads%20long%20string%20and%20sends%20it%20to%20DI%20at%20one%20go.

Fig6: Custom operator reads long string and sends it to DI at one go.

 

Main components in this pipeline are:

  • Custom ABAP operator to reads data from FM
  • ‘StringToStream’ Operator
  • ‘StreamToString’ Operator
  • Kafka Producer

 

Fig%207%3A%20New%20pipeline%20to%20break%20down%A0%20a%20message%20stream%20into%20individual%20messages%20before%20sending%20to%20Kafka

Fig 7: New pipeline to break down  a message stream into individual messages before sending to Kafka

 

Fig8%3A%20Output%20from%20ABAP%20operator%20is%20one%20long%20string%20separated%20by%20%u2018%5Cn%u2019.%20Wiretap%20showed%20only%20a%20few%20lines%20and%20truncated%20the%20rest%20of%20the%20message.

Fig8: Output from ABAP operator is one long string separated by ‘\n’. Wiretap showed only a few lines and truncated the rest of the message.

Fig%209%3A%20Output%20after%20a%20string%20is%20passed%20through%20%u2018StreamTostring%u2019%20operator.%20String%20from%20fig%208%20is%20separated%20into%20multiple%20messages%20each%20with%20its%20own%20timestamp.

Fig 9: Output after a string is passed through ‘StreamTostring’ operator. String from fig 8 is separated into multiple messages each with its own timestamp.

Hope this helps to tune your pipelines. I would love to hear if you have used other techniques to improve your pipeline performance.

Be the first to leave a comment
You must be Logged on to comment or reply to a post.