Process chain (flows) optimization
I recently started working as a SAP analytic. My first project was in SAP BW where I had been given the chance to optimize the current process chains for a company.
At the time we had 4 daily chains running (one in the morning, at lunch, in the evening and at midnight). Most of the chains where divided into 4layers. One that loads in the master data, one for loading in info packages, one for the DSO’s and one for the Cubes. So a pretty organized structure to start in and get to know everything.
When I started the project I had almost no knowledge about process chains or the data and structure of the company. So for optimizing the process chains I had to rely on google to find solutions. Luckily for almost everything there is a solution on google, unfortunately everything I wanted to know about optimizing process chains was divided over the web and I had to get everything I wanted to know on 100+ different pages. This is where I wanted to start my first blog, a blog about all the information I gathered during the months I have been optimizing the process chains.
Note that I’m currently still working on optimizing the process chains, so if I come across new things I will include them in here as soon as possible.
So let’s start off.
(I’m going to assume since you want to optimize the process chains you are familiar with ‘RSPC’ and know how to get the chain ID and other common things).
The first thing I did when starting (and for me personally the most important step) was documentation. I started off creating an OneNote document where I mapped the current structure of all chains. Remember you are going to make changes in a system someone else created and is (hopefully) at the moment functioning correctly. If you are just going to change things without documenting it you will eventually make a mistake and have no idea on how it was before. It is important to keep track of everything you do as well as of how it was before. Creating good documentation has another positive side effect. When you are mapping the process chains you will also have a look into the monitoring of DSO’s, Cubes, … . So even before checking the runtime of the chains you can spot some abnormalities in here. For example I spotted a few DSO’s that loaded in daily but only added a record monthly, so I could already mark this load to be moved to the monthly chain. Yes I know mapping out all the chains and documenting them takes a long time and becomes boring. But trust me, it is worth it.
Now for the real work. After you have made a good documentation on how everything is currently working you can go and have a look at the runtimes of the chains. I personally like to look this up with transaction ‘ST13’. As tool name type in: ‘BW-TOOLS’ and press execute. Next cross on the radio button of Process Chain Analysis and again press execute. Click on the ‘Process chain button’ and type in the process chain ID (We have a chain that stands above all others, I like to take this ones ID because it will also display all chains underneath it) and press execute.
If you don’t use daily process chains you will have to adjust the start and end date accordingly to when the chain has run before. You can also select a broader range of dates that way you can compare the runtimes of the chain.
Here you can already see the Runtime of the chain. However we want see in detail what exactly is consuming a lot of time. So simply click on the underlined text in the column ‘Chain’ (A lot of info on images will be blurred out I’m not certain which info I can show and which not, so better safe than sorry 🙂 ).
In this detail view we can open each chain individually and take a closer look on what is having a long runtime. You can also see the amount of records loaded in, which is a very helpful indicator on whether a load is taking an abnormally long time to run or not.
On this detail view you can already get a better look on loads that take a long time to process. Simply double click the load in the ‘Process Chain Hierarchy’ column, this will take you to a detail view of the load. Example of a Load from an infopackage:
So now you know how to spot the chains that are problematic. With this information we can start the real work. Let us begin with the load from infopackages.
As for infopackages I have only encountered 2 things that can slow the process down. When you looked into the detail view of the infopackage at ‘ST13’ it either just loads in a lot of data or as marked above the ‘1st package arrived’ takes a long time.
For the first case there is not much you can do, it’s much information and it will take time. As for the second case: Go to ‘RSA1’ search for the info package. Double click on the info package and double click on the data source. Copy the data source name and open the system from which you are requesting data for example ECC (if you have authorization that is). Go to ‘RSA2’ and paste the data source name. Go to the extraction tab. Here you can find how the data is requested. Unfortunately for this part I’m narrowed to only having one solution for a specific case. Which is when you see in the extraction tab that there is a custom function module that loads in the data. You will then have to debug the function module and see where you can improve it. If it’s a Function Module by SAP it’s best to leave it as it is.
So this is all the info I can provide at the moment as for optimizing infopackages. This is because I’m currently not authorized to look into loads that come from other source systems or databases.
Now let’s have a look at DTP’s. For DTP’s there are 2 things that are most common to slow down the process chain. First will be transformations. I’m not going to go too much into detail here but you should always investigate the transformation thoroughly. Always look into the start, end and rule type routines. They can have a massive impact on the loading time. Avoid having a lot of loops into your code and only use routines if really necessary for changing data in the backend. Best advice for this I can give is ‘DEBUG’. Debug the transformation and see what is taking a long time and what can be changed for the better. Every routine is different and there is not one general rule for routine code to make it faster. Put in some break-points where you think the code can be optimized (DON’T FORGET TO DELETE THE BREAK-POINTS AFTERWARDS!). Small tip in the infoprovider you can already check if it’s the start, end or rule type which is taking a long time.
If you are sure that the DTP doesn’t have routines that slow it down, the second most common reason is regarding the DTP being Delta or Full load. If you use daily chains and just want to add new data every day it is best to use a Delta load, this will save you a lot of time. Be aware you will sometimes notice that even delta loads with no routines are slow. Take a look into the target how many records are transferred and how many are actually added. I for example had a DTP which loaded in 20million records each day and added around 500. This was because in the process chain we didn’t clean up our PSA. A delta load will get all the data that is in the PSA, so if you don’t clean your PSA requests it will almost be the same as a full load except that you add less records. So try to always delete your old PSA requests, this way your load will be a lot faster.
Next for DSO’s. The only thing that I found out about DSO’s that has impact on performance is the setting ‘SID generation’. If you aren’t going to use the DSO for reporting then you should set it to never create SID’s.
For cubes the most important thing that slows down process chains is the dimensions. Dimensions have influence on the fact table. We want to keep the entries of a dimension as small as possible compared to the fact table. And if you load in data in a 1:1 transformation and the load is still slow it will most certainly has to do with this. You can easily check this by going to transaction ‘se38’ and executing following program: ‘SAP_INFOCUBE_DESIGNS’. Once loaded search for the cube you want to optimize.
Example of a cube that can be optimized (name of dimensions and fact table normally stands left but I cropped this out).
You want to keep the ratios of the dimensions as close to 0% as possible. I did this the following way. First take a look at the cube and it’s dimensions. Look how many charactiristics you have in total excluding the ones of the dimensions ‘Data Package’, ‘Time’ and ‘Unit’. If you have 13 or less characteristics, create as many dimensions as you have characteristics and separate them all under the new dimensions. Next go to the properties of the dimensions and make them Line Item Dimensions. In order to do this you will have to delete the data of the cube. Also when you make your dimensions line items, remember to go to transaction ‘SE38’ and execute ‘RSDG_TRFN_ACTIVATE’, Here you’ll have to reactivate all your transformations that load in data to the cube otherwise next load you’ll get a lot of errors.
If you have more then 13 characteristics in total, check the dimension that has a high ratio in the cube. You have to see the dimensions kinda like a grouping of characteristics that are related to each other. Try to find the characteristic that is out of place which seems to be very different from all others. Exclude this dimensions the same way as before and make it a ‘Line Item Dimension’ again. This will reduce the loading time of the cube.
You can only make dimensions with 1 characteristic ‘Line Item Dimensions’. I highly recommend always doing this when possible because it has advantages in performance:
- When loading transaction data, no IDs are generated for the entries in the dimension table. This number range operation can compromise performance precisely in the case where a degenerated dimension is involved.
- A table- having a very large cardinality- is removed from the star schema. As a result, the SQL-based queries are simpler. In many cases, the database optimizer can choose better execution plans.
The only disadvantage to this is that like I mentioned you cannot have more than 1 characteristic in this dimension. This is why I said that if you have less than 13 characteristics to just split them because the advantages are just way better.
Here’s an extra trick I learned by goofing around with DTP’s. Standard your DTP will probably have a package size of 50.000 and a number of parallel processing of 3. Take a look at the infoprovider where the data is loaded in and check how many records are being processed. For example I had a DSO which loaded in 3.000 records. Which means it will load in 1datapackage of 3.000records. Try to change the package size to 1.000. Now there will be 3datapackages running and being processed at the same time due to parallel processing.
This will be my blog about how I’m optimizing my process chains (flows) at the moment. As of this day I’m still working on these kind of things so if I would come across something new that might help the performance I will include it later on.
Also if someone has other or better ways of improving the performance please let me know.
Kind regards
Sven Swennen