Cloud Integration – Using Parallel Processing in General and Iterating Splitter
This blog describes how to use the parallel processing option in a splitter scenario in SAP Cloud Integration. It describes the recommendations and important configuration constraints of this configuration option.
Using Parallel Processing in General and Iterating Splitter
In many Cloud Integration scenarios big messages are split into smaller parts using a splitter pattern. The smaller chunks are then processed separately. In the splitter configuration, there is an option to switch on parallel processing for the single splits. But this option cannot or should not always be used and can even lead to unexpected problems. In this blog I describe the option and the important considerations for your integration flow configuration when using it.
Parallel Processing in Splitter
In the default configuration of the splitter step the single splits are executed sequentially one after the other. To improve the overall processing time of a splitter scenario you can use the option to run the splits in parallel instead. To configure this, select the option Parallel Processing in the configuration of the Splitter. When using Parallel Processing two additional settings have to be configured:
- A Timeout needs to be defined for the maximum processing time of the splits.
- For Splitter versions 1.5 and higher (available with January-20-2019 update) the Number of Concurrent Processes need to be configured. Default is 10 threads, like before when the value was not configurable. Using this setting you can control the parallelism and the load on the receiver system.
The effect of the parallel processing at runtime is that the inbound request is split into multiple separate new exchanges that are then processed completely independently from each other.
The defined number of threads handle the parallel splits. If there are more parallel splits than available threads, the next split is processed when the next thread gets available. This means, the overall time for the split processing depends on the processing time of the single splits and of the number of splits to be executed.
When using parallel processing in the splitter consider the following important aspects:
Is the Backend System able to Handle the Load?
If the splitter sends the multiple single splits to a backend you need to make sure that the backend can handle the expected parallel calls. Otherwise the request may run into timeouts and the whole scenario may stop working. Configure the number of threads accordingly.
Resource Consumption in Cloud Integration
If the splitter processes multiple splits in parallel all of them use resources in the Cloud Integration tenant, like memory, database connections and temporary storage for stream caching. There are no general recommendations possible because this heavily depends on the scenario and the flow steps and features used for the splits. Carefully test your scenario with parallel processing and the expected message size and volume to identify issues with resource consumption. Depending on your scenario, activating parallel processing may not lead to the desired performance improvements but can cause severe issues in the scenario and even to other scenarios running on the tenant.
Timeout Ends Without Error
As already stated, if you switch on Parallel Processing in the General or Iterating Splitter a Timeout field needs to be configured. This field defines the time after which the processing of the parallel splits ends latest and the next processing step, for example the Gather, is executed.
The Splitter interrupts the processing of the parallel splits after the configured timeout without an error and continues with the steps configured after the split. The timeout is a very important setting and needs to be defined high enough to execute all the splits in your scenario. Otherwise some splits may not be processed while the overall processing of the scenario continues with the next flow step. Depending on your scenario this could lead to data inconsistencies because not all splits are executed completely.
The recommendation is to test the scenario with the biggest expected messages in a realistic scenario and check the execution time. Then define a timeout that fits to the scenario.
Parallel Splitter not Supported with Transactional Resources
Splitter with parallel processing is not allowed with transactional resources, for example data store flow steps, JMS, XI or AS2 adapter. Details: How to configure transaction handling in Integration Flow
For more configuration recommendations with respect to the Splitter also check out the following blogs:
Thanks a lot for the inputs. Can you throw some light if session on re-use message exchange works if HTTP adapter call is used inside splitter?
I saw your blog on session re-use and it says that session reuse on message exchange works only when we do the initial HTTP call before splitter step but that defeats our technical requirement.
Please let me know for any other details.
as explained in the other blog you need to get the session with a call before the parallel splitter, else the session cannot be shared. Each call will open a new session.
Either do not use the parallel option in the splitter or you do a call to this system before the splitter or you do not use session-reuse.
Thank You Mandy. We have requirement where CPI to consume Flat file with multiple rows from S4 SFTP folder and each record needs to be "transformed" to a SOAP/XML webservice request to Target. Each response needs to be collected and created as a response file for S/4 Hana to consume.will
Hence We are using splitter and calling SOAP Target using HTTP Adapter inside splitter-gather. However when file size is bigger, processing time is going higher then expected. Hence I've enabled session reuse as part of optimization but looks that will not work.
Do you have any other recommendations?
As said, either do not use parallel processing in splitter or increase the timeout.
Thank You for Quick turnaround.
One other question, Lets say if i give high split timeout value and one of the splits is taking high time than others for any reason, Does gather wait till that one split which took high time is completed?
I assume it waits as it has to gather all the splits, as long as that one split which took high time responds before the timeout value else gather might ignore it since timeout value is also elapsed.
No, as explained in the blog, the timeout really defines the timeout for the splits. The next step will not wait for all the splits to be completed. You need to make sure to set the timeout high enough.
Thanks Mandy. Is the Timeout value per split (or) Timeout for complete splits? My assumption was it's per split message.
Also lets say my IFlow has to deal with large files once is a while hence if i give timeout as 2 hours or so based on my testing results, Is that okay? Does higher timeout value have any impact when IFlows deals with Smaller files? Any other negative impacts when we give higher timeout value like taking higher processing time (or) causing out of memory issues etc.,
No, timeout value has no impact for smaller size, as the processing ends when the last split is executed. This is only relevant if the processing really takes very long. Than the timeout will cut, this is important to avoid endless processing.
The timeout is for the complete splitter, not per split.
How does the parallel processes effect exchange properties? Let's say we split an XML file and in the subprocess I read this file and export a specific value to an exchange property "ex_id" so that I can use it in a HTTP call within the subprocess. Does that work or do the parallel processes share (and therefore overwrite) these exchange property?
no the parallel executions do not share exchange properties. Those are real separate exchanges leaving the splitter.
Can I start a parallel process and not wait for the execution to end? And in this case, return a response to the sender with the code 200.
I try to use "Parallel Multicast", but the sender still hangs until all the processes are finished.
I'm not really sure about your use case/scenario. But with parallel multicast or splitter this is not possible. Sounds like fire and forget scenario.
You can always have a async decoupling scenario with JMS for storing the message and then sending the 200. And in parallel process read from JMS and further process.
Maybe you share some more detail about your use case/requirement?
Thanks for your communication!
According to the scenario, I have to accept a packet from Sender, transfer it to the Receiver. Then wait 1 minute for the internal processes to take place in the Sender and call the API of the Sender system.
I was planning to get a package from Sender and tell him OK (code 200). Call the API in a minute. At the same time, send the packet to the Receiver without delay.
Maybe I'm wrong realizing the scenario logic.
I still do not completely get the whole scenario details.
just some ideas:
Thanks for the recommendations!
JMS is not available in my tenant. Therefore, only the first is a suitable solution. But it does not solve the main problem: Returning a response (ok - code) to the Sender so that he does not wait 1 minute.
The Recipient can wait until a minute has passed and the message at the end of the iFlow is delivered to him.
without JMS this gets difficult.
What is your sender adapter? With SOAP adapter and One-Way processing you would have the WS Standard option that would immediately return a 202 after receiving the messaging and not wait until the whole processing ends. Maybe this is an option?
I use HTTPS for Sender (
Using sub-proccess "Local Integration Process" won't Help?
No, this does not make a difference.