Improving performance of high volume data loads for PDI custom objects
You have implemented a custom object CBO via PDI and gererated a webservice based on that. This webservice can only handle one instance with each web service call, which has a severe impact on performance, especially in high data volume scenarios. This means, in order to upload 1.000 records of CBO into the system you need to perform 1.000 webservice calls, incl. all the overhead involved such as WSDL consistency checks, user login, authorization checks etc.
Technical Background: The PDI webservice generator is still using an outdated webservice implementation framework (CSG – Compound Service Generator) – which is not mass-capable.
In order to overcome above limitation, you can apply the following trick: Since also CSG-based webservices are able to handle multiple non-root segment instances, we simply define a “bracket” object around CBO, a so-called “Replication Request BO”, RRBO. This RRBO has an admin root node and captures a list of CBO instances in its CBO segment. The idea now is to generate a webservice based on RRBO which can import 1 RRBO instance – hence a bundle of CBO instances – with 1 RRBO webservice call. The RRBO itself is totally fault-tolerant, i.e. will always import the RRBO instance with the list of CBO instances, no matter if they are correct from a data consistency of view. This is possible as the RRBO only acts as kind of a staging area for unprocessed CBO instances. The actual processing of the single CBO instances within RRBO is done at a later step during a batch run locally in C4C in which also all consistency checks defined for CBO are executed and the CBO instances imported into the database. Any potential errors thrown by the CBO validations during the processing are written into an application log attached to the specific CBO instance within RRBO which can be accessed via an own work center view.
This new approach to import CBO instances via a staging area (RRBO) not only improves the performance of the data load. Another big advantage is that any ID mapping required for master data references within CBO can be performed locally within the processing of CBO and does not need to be taken care by the consumer (which is normally some kind of middleware calling C4C), e.g. by performing an ID mapping query web service call. Apart from that the consumer does not need to query the existing database in order to derive the correct WS operation (create vs. update) to call – This is now the task of the RRBO processing routine, hence no orchestration is needed for the different WS calls.
The following diagram depicts a UML diagram of the RRBO object structure:
- The RRBO Root node consists of admin data only:
- The action Cleanup() takes care of physically deleting RRBO instances once they have been successfully processed
- The CBO node resembles the structure of the original CBO root node (i.e. header data [CBOHeaderData] as well as existing subnodes “CBO Segments”). In addition to that, it contains:
- status fields, e.g. processingStatus, relevanceStatus
- applicationLogItemUUID which acts as link to an application log for persisting potential error messages raised by CBO
- action MarkAs(Ir)Relevant: to mark a CBO instance as irrelevant (either manually via UI or automatically via sequencing mechanism, cf. below); irrelevant CBO instances will not be processed
- action Process: to process CBO instance
- If relevanceStatus = “Check Pending” -> check if CBO instance is relevant (cf. “Sequencing” below)
- Import CBO intance (unless status = “Irrelevant”)
- Potential errors are written into an application log linked to the CBO instance in the request
- Set processingStatus to “Successful” or “Failed”
- Set replicationDateTime on CBO
- Import CBO request r only if no newer CBO image has been imported in the meantime
- Retrieve CBO with ID provided in request
- If no CBO instance could be found -> set relevanceStatus = “Relevant”
- If CBO instance c could be found, check if c.replicationDateTime >= r.creationDateTime; if yes, set relevanceStatus to “Irrelevant” and write meaningful message into application log
- Optional: the very same mechanism could be used to also accommodate manual changes via the UI; in this case the standard “LastChangeDateTime” field could be used instead
- Mass Data Run Object (MDRO): batch job to process in background; recommendation is to have 2 MDROs
- MDRO1 to process the data in the staging area (i.e. to call action “Process”)
- MDRO2 to cleanup successfully processed RRBO instances (i.e. to call action “Cleanup”)
- User Interface:
- The RRBO can be accessed via an own Workcenter View and OWL. Recommendation is to not expose RRBO on the UI, but rather show the underlying CBO requests instead