As Enterprise Businesses grow, the overall data volume and data traffic between various systems in (or outside) the Enterprise landscape grows. As we move forward towards distributed yet integrated ERP architectures, integrations are becoming more data heavy; hence more and more requirements come to build more robust and scalable integration in order to operate various systems and operations seamlessly. Yet we need to build integration architectures such that the control mechanism of various distributed systems are coupled with proper logical integrity with vivid error handling, recovery point and monitoring facilities.
Our client is no exception. They have a vast distributed landscape containing various components of SAP NetWeaver suite namely –
- Enterprise Core Component (ECC)
- Customer Relation Management (CRM)
- Advanced Planning and Optimization (APO)
- Global Trade System (GTS)
- Supplier Network Collaboration (SNC)
- Business Intelligence (BI)
- Supplier Relationship Management (SRM) etc.
Beside it has it’s mainframe based legacy systems like
- Passport Order Management system
- SWIMS Supply Chain management
- RSSS Software Management tool
- IFS – Another ERP tool for a separate business wing
- GPMS – Global Product Marking labeling system etc.
Also it has various integrations with external software i.e. various bank, distributors, suppliers via EDI and non EDI B2B communications.
SAP Process Integration PI 7.0 (formerly known as Exchange Infrastructure XI ) is used as a middleware tool to integrate the various diversified systems mentioned above.
As this project started to evolve our client’s Enterprise business system in 2005, all these systems were not data heavy. So the integrations were not dealing with such high volume.
However, in recent days, as client’s business grew and more and more operations are performed on SAP systems (migrating from legacy system) data flow between various systems has grown exponentially. So special care has been taken to put up a more robust design, so that we can handle high volume interfaces with higher performance; yet not loosing tightly coupled control mechanism end to end.
SAP XI/PI has a double stacked architecture where it has an ABAP based backbone engine. And on top of that it has the Integration Engine Java Runtime where the middleware mapping is done. Given in diagram 1 is the standard SAP XI architecture with its various components. The mapping programs are built in the Integration Builder: Repository or Enterprise Service Repository (ESR) and the configuration of the various end to end scenarios are done in Integration Builder: Directory or Configuration. At runtime the adapter engine which directly interacts with external systems and converts data to XI messages pushes data to the ABAP based Integration engine (IE). Integration Engine then allocates the messages in internal pipeline services to dispatch them to the Java Runtime. Java runtime holds the mapping program which maps the source to target structures. Target messages are then being dispatched back to IE to deliver back to target Communication Channel.
So in the end to end integration process, the message flows between the ABAP pipeline service queue as well as java runtime. For large volume persistent data flow file based integration is a very common scenario. Message flow in a FTP-XI-File scenario in a typical common architecture can be depicted as below:
In the scenario above, files are pushed/pulled from FTP server to various SAP systems like ECC, CRM via Network file system (NFS). In a typical XI architecture the files are read via FTP/File adapters and pulled into Integration engine of XI. Then the payload is pushed to Java runtime for mapping. In most of the file to file scenarios we find no transformation of the payload. So this becomes a pass-though interface. Next the outgoing pipeline service (out queue) dispatches the message payload to inbound file adapter to write the file to NFS. The reverse direction data flow is similar where data are pushed from NFS to external FTP server via XI.
1. In a pass-through scenario where we need not to have a payload transformation, the mapping piece is redundant. This unnecessarily takes more performance overhead to send the payload to Java Runtime from ABAP stack and vice versa.
2. Even if we bypass the mapping piece by putting dummy entry in Interface Determination, there is still a major performance issue. Since the file adapter is reading the entire file, it tries to transform it into XI message payload. So no matter if we use mapping or not, the pipeline services are used for the heavy messages. The major issue with this architecture is:
- a) In very high volume integrations like in this project where we need to move multi-GB files back and forth, file adapters are taking too much time reading the multi-GB file and converting it into XI message
- b) Again even if they huge files are converted into payload, they are creating performance overhead to the pipeline service clogging the queues. This impacts the overall performance of the middleware messaging system affecting other integration scenarios as well.
3. On the other hand after the files are delivered to NFS, there is little control over the respective programs in SAP systems that are using the file. Because file adapter cannot control or trigger the target program directly from integration standpoint we lack the control to trigger the end process.
To meet this type of typical customer need where data volume cannot be reduced and middleware is used mostly as file transporter not file transformer, we proposed a new design.
The major challenges are:
- Bypass the points where message clogging can happen potentially
- Reduce the payload volume drastically so that this scenario does not affect overall performance of the middleware system
- Yet we need to have end to end control over the scenario so that we can monitor/reprocess and recover the end-to-end scenario.
- To put XI as control agent in the Enterprise Landscape so that all end-o-end processes get connected and have monitoring/re-processing and recovery point.
The new architecture for high volume file-to-file integration is as follows:
Inbound to SAP
- In this design, we are using a metadata file as a trigger. This metadata file contains the actual payload data file details (e.g, file name, source path, other attributes). For a non-SAP source system, for example source FTP servers, XI adapters are configured to poll into the FTP server for the metadata file.
- The Metadata file kicks off the scenario. Since compared to the multi-GB payload, this metadata is very light weight (usually few KB) the message size in XI is very less avoiding pipelineclogging or any performance overhead.
- Next, in the Message mapping we are mapping a custom IDoc or proxy structure that carries the payload file information. Also from within the Java User defined function (UDF) we are calling Java standard FTP calls to bring the actual payload file from FTP server source to Network File System (NFS).
- On the target side the inbound trigger IDoc (we have set this in ECC) or inbound proxy (we set this framework in CRM) carries the NFS path and file name information to the target system.
- In the inbound Idoc processing FM or the inbound proxy process method, we write the actual logic to process the file from NFS and give a status back to the Idoc or Proxy. If the file processing failed then set the Idoc status to 51 (error). In case of proxy set the proxy status to re-process-able error status.
Outbound from SAP
- For outbound scenario also unlike conventional way to put the file in NFS and wait for middleware (XI) to pick the file, logic is written in the framework to kick off the XI process from within ECC or any SAP ABAP system, after the file write is complete. What we are doing is – calling a outbound proxy to XI carrying the metadata (file path/name/info) of the actual payload.
- This kicks off the integration scenario in XI with relatively light-weight message. Then similar to the Inbound process, Java FTP is called from within message mapping to put the actual file to FTP server.
- Also depending on the requirement we can configure the scenario in such a way so that we can send the file send success/failure message back to source system or can send a trigger to the target system to actually work upon the file after the file write is complete.
- In both ways the java mapping part guarantees the trigger Idoc/proxy/file is kicked off after the file transfer is complete.
3.1. High Throughput:
- Since in this design we by-passed the main integration engine’s pipeline service to process the actual high volume payload, the message process time is very fast. On my client’s landscape we have tested the though-put like this:
- For 1 GB file the processing time was about 10 Sec giving a transfer rate 100 MBps
Since the actual payload is not passed through the message queuing of XI, the scenario is virtually flexible for any type/size of payload. The actual throughput depends on the network bandwidth but not the middleware system. In our client, many scenarios that are handling multi-GB file move between systems are using this architecture.
Since the architecture is independent on the size of the payload, there is virtually no impact on the overall performance of the middleware even if we need to handle multiple multi-GB scenarios on the enterprise landscape.
This architecture not only acts as a file transporter, but also integrates the file processing end-to-end starting from source system to target system. Since the integration scenario is dealing with the metadata, that additional comm.-channel allows us to put better control mechanism in the target system beyond middleware reach. The trigger Idoc/inbound proxy not only helps kicking off the file processing subsequently, but also provides valuable error diagnostic in the Idoc / proxy message. This allows us to put better re-processing mechanism and recovery point.
In this project, there are number of legacy and distributed ERP system (as explained in section 1. We have got the following business benefits from this design:
1. We used to have performance issues with high volume interfaces. In conventional design the middleware was taking too much of time sometimes hours to process very large volume files. Now we are processing such scenarios in few seconds. For this reason many of the critical very high volume interfaces were manually run; not via integrated automated-process.
2. Previously due to these high volume interfaces, overall middleware queue processing was impacted resulting performance issues on other integration scenarios. With the new design overall middleware performance is improved.
3. Previously client used to use many discrete Control-M job scheduler to schedule file process jobs. There was little control over end to end monitoring and reprocessing. The new XI design with trigger Idoc/proxy gives better seamless control on monitoring and reprocessing.
The entire design was based on a reusable framework. The inbound trigger Idoc / proxy framework can fit in various types of file processing modules in form of plug-in function modules without modifying the file transmission mechanism. So the code for error handling/ reprocessing and basic file validation is part of the reusable framework. Only the application specific core file logics can be plugged-in.
Also the entire solution can be re-generated in other client scenario pretty quickly because the file transmission/trigger framework is not dependent on client specific application logic. But the common framework defies the file transmission and error handling/ reprocessing mechanism which can be common in any other client.