Skip to Content
Technical Articles

Improve parallel processing by splitting messages on number of groups

Disclaimer

When working with parallel processing one must always take constraints into account as outlined in this informative blog by Mandy Krimmel. A brief summary below:

Parallel processing data towards external systems

  • Take into account sender system constraints:
    • Rate limitations on used API’s or services
    • Maximum allowed connections
    • Processing load forced on the receiver system
  • Take into account calling system requirements:
    • Timeout limits in case of synchronous calls
    • Exception handling and rollback options

Parallel processing data in CPI 

  • Transactional resources support: if this is not properly configured unexpected errors can occur. For an overview read this blog from Mandy Krimmel.
  • Resource consumption:
    • Develop and test your solution without parallel processing to ensure it is properly working
    • Apply bulk test loading to evaluate whether the parallel solution works
    • Make sure that proper exception handling is applied to your solution
    • Apply header and property cleanup to ensure that each parallel (re-)run starts fresh
    • Avoid storing large payloads in headers or properties

Introduction

Recently I encountered a case that required parallel sending of data towards a system in which we wanted to optimize the number of connections used. So rather than splitting up the incoming messages in groups of a particular size, we wanted to fix the number of groups, distribute the load evenly and send it in parallel to the system. This was a challenge because the General and Iterative Splitter currently only support setting the group size and not the number of groups.

So in order to achieve parallel processing with a fixed number of groups it is required to pre-process the input message prior for using the splitter. In this blog I will show a solution using a Groovy script which can be used to achieve this.

 

Input data

In order to showcase I generated a simple XSD and WSDL based on Fruit Shipment. This data format consists of shipments of trucks which are loaded with pallets of a type of fruit. Below an example, I shared te schema’s and some tips & tricks in this blog.

<?xml version="1.0" encoding="UTF-8"?>
<root>
	<shipment id="shipment1"> 
		<truck id="11-aa-11">
			<pallet>
				<content>banana</content>
				<count>22</count>
			</pallet>
			<pallet>
				<content>pear</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>apple</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>orange</content>
				<count>220</count>
			</pallet>
		</truck>
	</shipment>
</root>

 

The splitting script

In order to achieve a fixed number of groups it is required to do process the input data prior for splitting and sending it. This processing is done with the Groovy script below.  The script consists of three parts which are explained in the following chapters

  • Collect required information
  • Create a list containing for each group the number of records
  • Process the XML to create the new grouping

In this script Groovy XML parsing objects are used which are described here.

Collect required information

In order to create the correct number of groups we need to have the following:

  • The node on which we want to group: this is externalized and set in the property groupingNode
  • The number of groups we want to create: this is externalized and set in the property numberGroups
  • The inputdata: this is parsed with the XML Parser.

 

The next step is to collect all the targeted nodes, this is done by finding all nodes with the same group name in parameter target nodes.
Last we want to prepare for the next step by setting parameters that describe the group size. Let say we have 15 input nodes and we want to create 4 groups: then we can fit three groups of 4 and have 3 nodes left to divide. The groupSize parameter represents the integer division while the rest is the remainder.

Create the group size list

The next step is to distribute the remaining nodes after integer division, because our goal is to create parallel processing groups that have an as equal as possible number of nodes. This is done by createGroupList function. This function produces an array with size equal to the group list and distributes the rest nodes evenly.

 

Create the new grouping XML

The last step is to create the new grouping XML. This is done with the function createGroupingXML. This function iterates over the grouping list and copies the required number of nodes to a new XML document. In this document we have a node called group which separates the grouping nodes. The parameter nodeCounter is used to make sure the next node is copied each time.

The new message is then returned to the flow. the XMLUtil object is used to convert the XML object to an XML string.

 

Xclone function

This function is used to create a new instance of the nodes we want to group. This instance can then be put in the returning XML document. I found this solution here, I find this the cleanest way of doing this.

import com.sap.it.api.ITApi
import com.sap.it.api.ITApiFactory
import com.sap.it.api.securestore.*;
import com.sap.gateway.ip.core.customdev.util.Message;
import java.util.HashMap;
import groovy.xml.*;

def Message processData(Message message) {
    String body = message.getBody(java.lang.String);
    def properties = message.getProperties();
   
    String nodeName = properties.get("groupingNode");
    int numberGroups = properties.get("numberGroups").toInteger();
    
    def xml = new XmlParser().parseText(body);

	def targetNodes = xml.'**'.findAll { node -> node.name() == nodeName };

	int groupSize = targetNodes.size().intdiv(numberGroups);
	int rest = targetNodes.size() % numberGroups;

	def groupSizeList = createGroupList(numberGroups, groupSize, rest);

	def outputXML = creatGroupingXML(groupSizeList, targetNodes);
	def serialized = XmlUtil.serialize(outputXML);

    message.setBody(serialized);
    return message;
}

def createGroupList(int numberGroups, int groupSize,int rest){
		def groupSizeList = [];

		for(int i=0; i < numberGroups; i++){
			int size = groupSize;
			if(rest > 0 ){
				size++;
				rest--;
			}
			groupSizeList.push(size);
		}
		return groupSizeList;
	}

	def creatGroupingXML(def groupSizeList, def targetNodes){
		def parser = new XmlParser();
		def outputXML = parser.parseText("<root></root>");
		def nodeCounter = 0;
		groupSizeList.each{group ->
			int copySize = group.toInteger();
			def splitNode = parser.parseText("<group></group>");
			for(int i = 0; i < copySize; i++){
				def copyNode = xclone(targetNodes[nodeCounter]);
				splitNode.append(copyNode);
				nodeCounter ++;
			}
			outputXML.append( splitNode );
		}
		return outputXML;
	}
	def xclone(Node n){
		return new XmlParser().parseText(XmlUtil.serialize(n))
	}

 

Creating the flow

The image above outlines the flow used to test this solution. The SOAP adapter is used to receive the fruit input data. Then in the content modifier the grouping node and size is defined which is used in the script. Parameters are externalized to optimize flexibility in the configuration.

Then the script as described creates the desired grouping which will be further processed by the General Splitter. In its configuration we define the new grouping node to be the parent with group size 1. In this case we know that we then are processing the correct number of groups. Additionally, we set the General Splitter on parallel processing and define the number of concurrent processes. I just put it on the maximum of 50. The reason is that we now control the number of processes with the numberGroups parameter, as long as it does not precedes 50 all will be run in parallel.

 

The content modifier is just used to capture the payload and the number of runs in each splitter processing run. In an actual development process this step will be replaces with required request/response processing and a external call step like request/reply to send the data.

 

Testing the flow

In order to test the flow the input data was setup to be 2 shipments, 4 trucks and a total of 32 pallets.
The grouping node is set to pallets and the number of groups to 5 as shown in previous chapter.
This results in two groups of 7 pallets and three groups of 6 pallets processed. Below the request data, the intermediate grouping data and the screenshot showing the number of times the splitter has run.

 

Request data

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:fru="http://www.example.org/fruitshipment/">
   <soapenv:Header/>
   <soapenv:Body>
      <fru:processFruitShipment>
         <shipment id="shipment1"> 
		<truck id="11-aa-11">
			<pallet>
				<content>banana</content>
				<count>22</count>
			</pallet>
			<pallet>
				<content>banana</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>banana</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>banana</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>banana</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>banana</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>banana</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>banana</content>
				<count>220</count>
			</pallet>
		</truck>
		<truck id="22-aa-22">
			<pallet>
				<content>pear</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>pear</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>pear</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>pear</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>pear</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>pear</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>pear</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>pear</content>
				<count>220</count>
			</pallet>
		</truck>
		<truck id="33-aa-33">
			<pallet>
				<content>orange</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>orange</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>orange</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>orange</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>orange</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>orange</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>orange</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>orange</content>
				<count>220</count>
			</pallet>
		</truck>
		<truck id="44-aa-44">
			<pallet>
				<content>apple</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>apple</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>apple</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>apple</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>apple</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>apple</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>apple</content>
				<count>220</count>
			</pallet>
			<pallet>
				<content>apple</content>
				<count>220</count>
			</pallet>
		</truck>
	</shipment>
      </fru:processFruitShipment>
   </soapenv:Body>
</soapenv:Envelope>

 

Intermediate grouping data

<?xml version="1.0" encoding="UTF-8"?><root>
  <group>
    <pallet>
      <content>banana</content>
      <count>22</count>
    </pallet>
    <pallet>
      <content>banana</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>banana</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>banana</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>banana</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>banana</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>banana</content>
      <count>220</count>
    </pallet>
  </group>
  <group>
    <pallet>
      <content>banana</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>pear</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>pear</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>pear</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>pear</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>pear</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>pear</content>
      <count>220</count>
    </pallet>
  </group>
  <group>
    <pallet>
      <content>pear</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>pear</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>orange</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>orange</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>orange</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>orange</content>
      <count>220</count>
    </pallet>
  </group>
  <group>
    <pallet>
      <content>orange</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>orange</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>orange</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>orange</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>apple</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>apple</content>
      <count>220</count>
    </pallet>
  </group>
  <group>
    <pallet>
      <content>apple</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>apple</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>apple</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>apple</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>apple</content>
      <count>220</count>
    </pallet>
    <pallet>
      <content>apple</content>
      <count>220</count>
    </pallet>
  </group>
</root>

 

Content modifier step processing

This shows the number of runs the splitter has made and the data used in each run.

 

 

 

 

2 Comments
You must be Logged on to comment or reply to a post.
  • Hi Bram,

    Thanks for the very informative blog.

    Just wondering, how to improve performance if there is any Local Integration Process used after the Splitter step?

    How can we improve performance when messages have to be split based on each record?

    Regards,

    Pavan G

     

    • Hi Pavan G,

      If you need to split on a single message you still can process in parallel provided that the number of threads is set higher than 1. So if you set 10 threads for example and split on a single message, you will still notice that 10 messages were started at the same time.

       

      Best,

       

      Bram