Performance Tuning for Data loading in process chains
Performance
Tuning for Data loading in process chains
Summary
The objective of the document is to give information on Performance Tuning for process chains
Most of us start our career in a support project. At the initial stages, the information provided below will us in identifying issues with performance of process chains.
Introduction
A process chain is scheduled sequence of processes linked together and executed in a pre-defined order. Process chain is a mechanism that controls extraction, transformation, and loading (ETL) activities execution at a scheduled time in a defined manner.
If you use Process chains, you can
- automate the complex schedules in BW with the help of the event-controlled
processing, - visualize the schedule by using network applications, and
- Centrally control and monitor the processes.
Each step is made up of a Process Type which generally corresponds to BW activity.
Process type is the kind of process that is being executed. Each process has a type of activity associated to it like starting an Info package to load data, activating DSOs. A Process chain can be included in other process chains which is called Meta chain.
Below is a sample process chain which exactly suits the whole ETL process in correct sequence. This sequence can have a significant impact on the load performance.
Commonly faced Issues in process chain (based on runtimes)
There are many challenges which influence the successful execution/completion of the process chains in a support project. Here are the major factors
- The system executes several process chains at once and this slows down the system performance
- Based on number of background processes allotted, the process chain fails and indirectly affects other process chain runtime
- Based on the data being extracted from source systems
- Based on the time taken to complete each process which gets delayed due to various reasons
- Due to complex routines, DTPs can take time to update data to the info-providers
Proposed Solutions
Most of the above challenges are standard and we may have to face these issues while using process chain. In few cases, we could avoid the issues while designing the process chain itself. There are few steps that can be used to improve the performance of the process chains.
- The master data load creates all SIDs and populates the master data tables (attributes and/or texts). If the SIDs does not exist when transaction data is loaded, these tables have to be populated during the transaction data load, which slows down the overall process. Hence load master data before transaction data.
- Use delta update mode rather than full for the data loads where ever applicable, so that the run time decreases significantly.
- Delete the PSA request daily while loading full load to avoid long runtimes. See the below example for better understanding
The Info-cube gets full load data from source on a daily basis, but this is the business requirement. Hence we can’t change the same to delta update mode. If
you check the number of transferred records and added records in the below screenshot, there is a huge difference. The load time has also increased day by
day.
The problem was because PSA table has data from day 1 and so the transferred records gradually increased and added records are from the current day. To
overcome this issue, we should add a step to delete the PSA table before info-package picks data from the source.
Once the PSA deletion step has been added, both transferred and records will be almost same and the DTP runtime will be reduced a lot.
- Create a separate process chain for deleting indices of all data targets used in all process chains and decrease the overall runtime of individual process chains. Make sure this chain finishes before other process chain starts, so that deadlock situations can be avoided
- We may not expect data from source every-day. There are days where there are no documents posted or no invoices created which will end up in zero
records. In those situations, we should make sure that the Info-package in question should have the following setting.
Display Variant on Info Package process type. The Maintain Info Package screen will appear.
Click on Scheduler -> Traffic Light Color for Empty Requests as below
The below screen will appear which is by default set to Yellow which means if there is no data in source system, the process will never finish. Hence change the status to Green and save the Info package which will avoid long runtimes and confusions.
Important T-Codes
RSPC – Process Chain Maintenance
RSPCM – Monitor Daily Process Chains
RSPC1 – Display log of one chain at a time
RSM37 – Display Jobs with Program Parameters
ST13 – Runtime Analysis of process chains. (To get the details of process chain/process types,
use BW – Tools and execute)
Hi Karthik,
good job and the way you've articulated really splendid.
but please check below point if we use full update in the flow its consumes most of your time hence its best practice to make delta where ever necessary,
correct me if i am wrong!!!
Cheers,
Harish
Thank you Harish. for your comments and spotting the mistake. You are right. Its the other way around 🙂 . Document is updated now.
Regards
Karthik
Agree with Harish - it is not true that a full update mode offers "significantly" better run time than a delta. Most often it is the other way round, because a delta pulls lesser data than a full.
Thank you Suhas
Hi Karthik,
keep post good document like this..
Regards,
Arun
Thanks Arun 🙂