Skip to Content
Technical Articles
Author's profile photo Bhalchandra Wadekar

From 8 hrs to mere 37 mins – Performance Improvements for loading the data Region-wise

In my current project, we are syncing two systems using SAP Cloud Integration. Records from 25 tables are sent from the Sender system to the Receiver system. Some of these 25 tables are huge, and the first full load is often time-consuming for those huge tables. In this blog, I’ll share the performance improvements that helped reduce the required time for loading the data into the Receiver System.

As-Is Design

Sender System

The sender system would have records in a table. Each record would be assigned a country code and a region code.

For example, the Customer table has records like these:

Id Country Code Region Code Column 3…n
1 GB UKC
2 GB UKD
3 GB UKE
4 GB UKF
5 IN MH
6 IN GJ
7 IN PB
8 IN DL

Limitations of Sender System

Sender System had these limitations:

  • API only allowed 2000 records in a single synchronous call.
  • The sender adapter only created one message at a time, i.e., once the given 2000 records have been processed fully, then only the next batch of 2000 records will be available for processing.

Receiver System

The receiver system is a file system where there is a directory for each country code and under every country code directory, there are directories for region codes. In the region’s directory, each object will have its file.

For example, GB and IN directories will represent the United Kingdom and India. GB will have further directories representing each region like UKC for North East, England, UKD for North West, England, etc. Each regional directory will have object/table files like Customers.csv, Invoices.csv, Orders.csv and so on.

Directory Structure

Limitations of Receiver System

The receiver System also has its limitations and they are

  • For a given region, only one connection is allowed at a time because the regional business users would use the same system and data load activities should not affect the experience of the business user.

The Flow

In the As-Is Design, we picked up the records from the tables in batches of 2000. Each record would have a country code and region code. The second step is to group the records belonging to the same country and region. For each group of records, a message will be created. Now, all records will belong to a single country and a single region in a single message. Finally, we can determine the file path based on country and region. Once the file path is determined, all that remains is to append the records to the file.

Example

Let’s consider these 10 records. I’ll use the batch size of 5 for the demonstration.

Step 1: The Records

Id Country Code Region Code Column 3…n
1 GB UKC
2 GB UKC
3 IN GJ
4 GB UKC
5 IN GJ
6 GB UKC
7 GB UKC
8 IN GJ
9 GB UKC
10 IN GJ

Step 2: Fetch the Records in batches of 5 sequentially

Here, the records will be fetched in batches of 5. So, Batch 1 will come first in a message. Once Batch 1’s processing is complete Batch 2 will be available.

Batch Id Country Code Region Code Column 3…n
Batch 1 1 GB UKC
2 GB UKC
3 IN GJ
4 GB UKC
5 IN GJ
Batch 2 6 GB UKC
7 GB UKC
8 IN GJ
9 GB UKC
10 IN GJ

Step 3: Group the Records by Country and Region and create separate message country-region wise for the current batch

Batch Message Id Country Code Region Code Column 3…n
Batch 1 Message 1 1 GB UKC
2 GB UKC
4 GB UKC
Message 2 3 IN GJ
5 IN GJ

Step 4: Write records in a given message to the correct folder

Now that message 1 from Batch 1 contains the UKC region of GB, we can write the records to the Customers.csv file in the /GB/UKC directory.

Batch Message Id Country Code Region Code Column 3…n
Batch 1 Message 1 1 GB UKC
2 GB UKC
4 GB UKC

Step 5: Perform Step 4 for all messages in a given batch

Step 6: Perform Steps 3 and 4 for all batches sequentially

Step 7: Done

In the end, we end up with 4 calls to the receiver system in total:

Batch Message Id Country Code Region Code Column 3…n
Batch 1 Message 1 1 GB UKC
2 GB UKC
4 GB UKC
Message 2 3 IN GJ
5 IN GJ
Batch 2 Message 3 6 GB UKC
7 GB UKC
9 GB UKC
Message 4 8 IN GJ
10 IN GJ

The Bottleneck and the Solution

The bottleneck in this design starts with the limitation of the receiver system. We can only make one connection at a time for a given regional file. And the sender system only provides 2000 records at a time in a single synchronous call. Also, the records can only be written to files sequentially.

💡 We need to make the least connections for a given batch of 2000 records. This can be done if the records are sorted by country and region before the batches are formed. However, sorting takes time and the synchronous call will not be able to hold the connection open while the sender system finishes sorting. So, we switched to the asynchronous call. Fortunately, the asynchronous call has a higher limit of 10000 records per message. Yes! Another win :).

To-Be Design

The To-Be Design simply sorts the records before forming batches. Everything else remains the same.

Example

Let’s run through our example records now. Step 1b is new and does the magic.

Step 1: The Records

Id Country Code Region Code Column 3…n
1 GB UKC
2 GB UKC
3 IN GJ
4 GB UKC
5 IN GJ
6 GB UKC
7 GB UKC
8 IN GJ
9 GB UKC
10 IN GJ

Step 1b: Sort the records by Country Code and Region Code

Id Country Code Region Code Column 3…n
1 GB UKC
2 GB UKC
4 GB UKC
6 GB UKC
7 GB UKC
9 GB UKC
3 IN GJ
5 IN GJ
8 IN GJ
10 IN GJ

Step 2: Fetch the Records in batches of 5 sequentially

Here, the records will be fetched in batches of 5. So, Batch 1 will come first in a message. Once Batch 1’s processing is complete Batch 2 will be available.

Batch Id Country Code Region Code Column 3…n
Batch 1 1 GB UKC
2 GB UKC
4 GB UKC
6 GB UKC
7 GB UKC
Batch 2 9 GB UKC
3 IN GJ
5 IN GJ
8 IN GJ
10 IN GJ

Step 3: Group the Records by Country and Region and create separate message country-region wise for the current batch

Batch Message Id Country Code Region Code Column 3…n
Batch 1 Message 1 1 GB UKC
2 GB UKC
4 GB UKC
6 GB UKC
7 GB UKC

Step 4: Write records in a given message to the correct folder

Now that message 1 from Batch 1 contains the UKC region of GB, we can write the records to the Customers.csv file in the /GB/UKC directory.

Batch Message Id Country Code Region Code Column 3…n
Batch 1 Message 1 1 GB UKC
2 GB UKC
4 GB UKC
6 GB UKC
7 GB UKC

Step 5: Perform Step 4 for all messages in a given batch

Step 6: Perform Steps 3 and 4 for all batches sequentially

Step 7: Done

In the end, we end up with 3 calls instead of 4 calls to the receiver system in total:

Batch Message Id Country Code Region Code Column 3…n
Batch 1 Message 1 1 GB UKC
2 GB UKC
4 GB UKC
6 GB UKC
7 GB UKC
Batch 2 Message 2 9 GB UKC
Message 3 3 IN GJ
5 IN GJ
8 IN GJ
10 IN GJ

 


If this blog helped you, please leave a like and please share it with your network. Thank you.


Results and Other Improvements

With As-Is Design the biggest table for a country with 45 regions took approx. 8 hours to run. Yes, 8 hours from end to end. With the small tweak in the design, the To-Be Design took only approx. 37 mins. Sorting took approx. 12 mins and the rest of the process took 25 mins.

This particular test case was an extreme example. With 45 regions, in the worst case, you can have records belonging to all 45 regions in all batches. This means you’ll make 45 calls per batch to the receiver system. With the solution to the bottleneck, in the best case, you’ll make as many calls as there are batches or 45 calls (one for each region) assuming that each region has at least one record.

While analysing the bottleneck and applying the solution, we came across these other improvements that are more general and applicable in many other scenarios:

  • Indexing the source tables
    This ensured that minimum time is required for sorting the records before forming the batches
  • Enabling the HTTP Session Reuse at Integration Flow Level
    The receiver system has a directory like structure, but it is accessed using an HTTP Receiver Adapter. And, the code to connect with the receiver system and append to the file is written in its own Integration Flow. This Integration Flow would receive messages from multiple countries and regions for writing to the file. Enabling the HTTP Session Reuse at the Integration Flow level ensured that the login call would only be made when the session expired.
  • Using the GZip compression
    Previously, we were using Octet-Stream and sending the data in plain format. Compressing the data reduced the time consumption by 16%.

Summary

Performance improvement requires the identification of bottlenecks and then applying solutions. In the scenario discussed in this blog, sorting the data beforehand improved the performance. Indexing the source tables helped in sorting the data quickly. On the receiver side, reusing the HTTP session and compressing the data helped.

References/Further Readings

Hope this helps,
Bala

Assigned tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.