Skip to Content
Product Information

Cloud Integration – Using Parallel Processing in General and Iterating Splitter

This blog describes how to use the parallel processing option in a splitter scenario in SAP Cloud Platform Integration. It describes the recommendations and important configuration constraints of this configuration option.

Using Parallel Processing in General and Iterating Splitter

In many Cloud Integration scenarios big messages are split into smaller parts using a splitter pattern. The smaller chunks are then processed separately. In the splitter configuration, there is an option to switch on parallel processing for the single splits. But this option cannot or should not always be used and can even lead to unexpected problems. In this blog I describe the option and the important considerations for your integration flow configuration when using it.

Parallel Processing in Splitter

In the default configuration of the splitter step the single splits are executed sequentially one after the other. To improve the overall processing time of a splitter scenario you can use the option to run the splits in parallel instead. To configure this, select the option Parallel Processing in the configuration of the Splitter. When using Parallel Processing two additional settings have to be configured:

  • A Timeout needs to be defined for the maximum processing time of the splits.
  • For Splitter versions 1.5 and higher (available with January-20-2019 update) the Number of Concurrent Processes  need to be configured. Default is 10 threads, like before when the value was not configurable. Using this setting you can control the parallelism and the load on the receiver system.

The effect of the parallel processing at runtime is that the inbound request is split into multiple separate new exchanges that are then processed completely independently from each other.

The defined number of threads handle the parallel splits. If there are more parallel splits than available threads, the next split is processed when the next thread gets available. This means, the overall time for the split processing depends on the processing time of the single splits and of the number of splits to be executed.

 

Important Constraints

When using parallel processing in the splitter consider the following important aspects:

 

Is the Backend System able to Handle the Load?

If the splitter sends the multiple single splits to a backend you need to make sure that the backend can handle the expected parallel calls. Otherwise the request may run into timeouts and the whole scenario may stop working. Configure the number of threads accordingly.

 

Resource Consumption in Cloud Integration

If the splitter processes multiple splits in parallel all of them use resources in the Cloud Integration tenant, like memory, database connections and temporary storage for stream caching. There are no general recommendations possible because this heavily depends on the scenario and the flow steps and features used for the splits. Carefully test your scenario with parallel processing and the expected message size and volume to identify issues with resource consumption. Depending on your scenario, activating parallel processing may not lead to the desired performance improvements but can cause severe issues in the scenario and even to other scenarios running on the tenant.

 

Timeout Ends Without Error

As already stated, if you switch on Parallel Processing in the General or Iterating Splitter a Timeout field needs to be configured. This field defines the time after which the processing of the parallel splits ends latest and the next processing step, for example the Gather, is executed.

The Splitter interrupts the processing of the parallel splits after the configured timeout without an error and continues with the steps configured after the split. The timeout is a very important setting and needs to be defined high enough to execute all the splits in your scenario. Otherwise some splits may not be processed while the overall processing of the scenario continues with the next flow step. Depending on your scenario this could lead to data inconsistencies because not all splits are executed completely.

The recommendation is to test the scenario with the biggest expected messages in a realistic scenario and check the execution time. Then define a timeout that fits to the scenario.

 

Parallel Splitter not Supported with Transactional Resources

Splitter with parallel processing is not allowed with transactional resources, for example data store flow steps, JMS, XI or AS2 adapter. Details: How to configure transaction handling in Integration Flow

 

Further Readings

For more configuration recommendations with respect to the Splitter also check out the following blogs:

/
11 Comments
You must be Logged on to comment or reply to a post.
  • Thanks a lot for the inputs. Can you throw some light if session on re-use message exchange works if HTTP adapter call is used inside splitter?

    I saw your blog on session re-use and it says that session reuse on message exchange works only when we do the initial HTTP call before splitter step but that defeats our technical requirement.

    Please let me know for any other details.

    • Hi,

      as explained in the other blog you need to get the session with a call before the parallel splitter, else the session cannot be shared. Each call will open a new session.

      Either do not use the parallel option in the splitter or you do a call to this system before the splitter or you do not use session-reuse.

      Best regards

      Mandy

      • Thank You Mandy. We have requirement where CPI to consume Flat file with multiple rows from S4 SFTP folder and each record needs to be “transformed” to a SOAP/XML webservice request to Target. Each response needs to be collected and created as a response file for S/4 Hana to consume.will

        Hence We are using splitter and calling SOAP Target using HTTP Adapter inside splitter-gather. However when file size is bigger, processing time is going higher then expected. Hence I’ve enabled session reuse as part of optimization but looks that will not work.

        Do you have any other recommendations?

  • One other question, Lets say if i give high split timeout value and one of the splits is taking high time than others for any reason, Does gather wait till that one split which took high time is completed?

    I assume it waits as it has to gather all the splits, as long as that one split which took high time responds before the timeout value else gather might ignore it since timeout value is also elapsed.

    • No, as explained in the blog, the timeout really defines the timeout for the splits. The next step will not wait for all the splits to be completed. You need to make sure to set the timeout high enough.

      Best regards

      Mandy

      • Thanks Mandy. Is the Timeout value per split (or) Timeout for complete splits? My assumption was it’s per split message.

        Also lets say my IFlow has to deal with large files once is a while hence if i give timeout as 2 hours or so based on my testing results, Is that okay? Does higher timeout value have any impact when IFlows deals with Smaller files? Any other negative impacts when we give higher timeout value like taking higher processing time (or) causing out of memory issues etc.,

        • No, timeout value has no impact for smaller size, as the processing ends when the last split is executed. This is only relevant if the processing really takes very long. Than the timeout will cut, this is important to avoid endless processing.

          The timeout is for the complete splitter, not per split.

  • How does the parallel processes effect exchange properties? Let’s say we split an XML file and in the subprocess I read this file and export a specific value to an exchange property “ex_id” so that I can use it in a HTTP call within the subprocess. Does that work or do the parallel processes share (and therefore overwrite) these exchange property?