Performance tuning is always a consideration when building data warehouses. The expected tendency is for a DW DB to grow, as data is being added on a daily basis. And as the volume increases, performance will decrease as reports are now querying millions of records across multiple years.

The purpose of this document is to focus on a few backend performance best practices and tricks that I’ve learned across the years, that will save time during ETL, and as a result, will be an overall more efficient DW to serve the clients reporting interests.

Many of these topics have been covered here on SCN, some not so much. This is by no means a comprehensive list and is a consolidation of what I’ve found most useful in my projects.

For full disclosure, I’m not an ABAPer and my ABAP knowledge is limited. So the basis for my recommendations are purely based on performance comparisons before and after the ABAP changes referenced in this post. I’m sure that you can find many blogs and forums around each of topics I’ll be mentioning.

Logical Partitioning

As your data warehouse matures, and your data is now many years old, you might find yourself stuck with a cube that contains multiple years of data. If best practices were followed, you should have a multiprovider on top of your cube for the reporting layer:


There are a few disadvantages to having a single cube storing multiple years of data:

  • Increased loading time
  • Increased reload time if changes are required, such as a new field being added (all history needs to be reloaded)
  • Decreased query performance, as the select from BEx will sift through all the records in the cube

Logically Partitioned cubes are exactly what the name suggests: partitioning (or splitting) the data in the cube according to a logical criteria. The example I will give below is an easy to understand one, but it can really be done by any field that doesn’t have too many dimensions to it.

What I’ve done in the past, was to logically partition cubes based on the Fiscal Year/Calendar Year. In this scenario, you’d have multiple infocubes logically partitioned by year, 2010, 2011…..2017, 2018 etc. All the cubes would be linked together through a single multiprovider:


This option does require a little setup, such as creating transformations and DTPs for each cube, and ensuring the correct data is being loaded to each cube, either through a simple start routine that deletes the SOURCE_PACKAGE where CALYEAR NOT “YYYY”, or by having filters in the DTPs that only load the appropriate year into each cube.

But once the setup is done, if data reloads are required for specific years, you only need to reload that particular cube, without disrupting reporting on all the other data.

When running a report with data for a single year, the multiprovider will be smart enough to direct the select statement to the appropriate cube, thus eliminating data from the other years that will not be queried, and improving report run times.

This practice is the most recommended, so much so that SAP provided a standard functionality to accomplish that, which is topic of the next segment:

Semantic Partitioned Objects

SPOs are the exact same concept as logical partitioning. SAP provided functionality to enable partitioning in BW in a more streamlined and automated way.

The link below provides a great how-to guide on using SPOs by Rakesh Dilip Kalyankar:


Batch Jobs for DTP Executions

As part of ETL, BW developers will schedule DTPs to be executed. Depending on the size and complexity of the process chains, as well as the number of available background jobs at any given time, we can take advantage of having multiple parallel background jobs to be run per DTP.

There isn’t a straightforward number of parallel jobs that should be set, since each BW system is unique, in terms of:

  • Amount of infoproviders that need to be loaded
  • Availability and number of background jobs
  • Timing in which loads need to be completed
  • Number of DTPs scheduled to run in parallel via process chains

Although one main rule is that the number of jobs has to be multiples of 3 for transactional data, and we can only have 1 job for master data.

To change the background job settings, in the menu, click on Goto -> Settings for Batch Manager …


Change the Number of Processes number to increase or decrease the number of background jobs


Here are a few tips on how to determine the number of background jobs to set for a particular DTP:

1.       Assess the volume and criticality of the load:

     a.       Higher volume and critical loads (loads that need to be completed before 8AM or when the business needs the data) should have more parallel background jobs. A ballpark number I’ve used is between 9-12 background jobs.

     b.       If, for example, you have a DTP load that on a daily basis takes 4 hours and is holding the entire process chain down, it might be a candidate for increasing background jobs.

2.       Determine the amount of available background jobs by going to SM51 and double clicking on the available servers, and viewing the total background jobs


With that number, you have an estimate of how many background jobs you can parallelize.

*Important: if you have 20 total background jobs, you should NOT try to max out and consume those 20 jobs. Try and stay 2-3 jobs below that threshold so you don’t get into job queuing issues which could lead to loss of performance

3.       Assess the process chains in your system to identify where loads can be parallelized and where it makes sense to increase the number of background jobs in the DTPs

DTP Data Package Size

Many times overlooked, the Package Size can provide moderate to significant performance improvements if tweaked correctly.

Typically a larger package size is preferable, since the overhead to initialize each data packet is reduced, since more records are being bundled into a single package.

For example, if we’re loading 50k records from a PSA to a DSO:

1.       Package size of 1000:

     a.       This results in 50x overhead since the DTP is going to load 50 packages of 1000 to load 50k;

2.       Package size of 50000:

     a.       Only one overhead processing since we’re grouping all the records into a single 50k package.

Option 2 will provide better performance


However, there are some considerations when modifying the package size:

1.       If there’s custom logic in the transformation to do large volume lookups from other DSOs or Master Data objects, it might be necessary to reduce the package size, since there’s a limit to how much memory can be used for internal tables. If the level of complexity or data volume in the custom code is so great, then that might be the only option to ensuring the loads complete successfully. And you will only find out through trial and error;

2.       If there’s no custom code or logic in the transformation, it is possible to expand the package size by 5-10x the default setting of 50k. Again, trial and error will help determine the sweet spot.

Secondary Indices on DSOs

Another common issue observed was around poor performance in transformations when doing non key selects from DSOs. Fortunately SAP provides a very simple fix for that, which is the ability to create Secondary Indices on DSOs.

It’s a very simple process, which will improve the performance in those selects tremendously.

In the DSO modelling screen (double clicking on the DSO)


At the bottom, under Indexes, right click and select Create New Indexes


Leave the Unique checkbox unchecked. If checked, the secondary index which is being created would have to have unique values in the DSO, which might not be the case, if for example the index is being created on the GL Account field. There could be multiple records in the DSO with the same GL Account.



Once created, simply drag a field from the Data Fields into the new index, and activate the DSO


Master Data Lookup in Transformation

SAP introduced a nice functionality that automatically allows us, without any code, to select master data attributes in the field routine of transformations:


This simplifies the build process significantly, as no code is required.

However, we’ve noticed that this actually decreases the loading performance compared to doing a select statement in the start routine, and then reading the internal table in the field routines, as the example below:


= ‘A’


      FIELD1 =
IF <f_s_itab1> IS ASSIGNED.
           RESULT = <f_s_itab1>-FIELD2.

UNASSIGN <f_s_itab1>.

The reason for that is actually quite simple. If for example, we’re loading 50k records per data package, the single select statement in the start routine will do one select for 50k records and store that in our internal table.

If we use the field routine standard logic, as it is a FIELD routine, it will end up doing 50k selects, which is significantly more costly.

One consideration to keep in mind is for time dependent master data. Given the complexities around figuring out the correct time period to select, I have used the standard SAP functionality of Reading the Master Data in the field routines of the transformations to select time dependent attributes.

Hashed Tables

Hashed tables are nothing but tables that have a defined key, as opposed to a standard table where you need to perform a SORT command in order to do an efficient READ with BINARY-SEARCH.

So when declaring the DATA type, there would be an explicit command:

TYPES: BEGIN OF t_itab1,
        field2 TYPE /BI0/OIFIELD2,
END OF t_itab1.


FIELD-SYMBOLS: <f_s_itab1> type t_itab1.

For this particular internal table itab1, we know that field1 is unique and therefore we can declare it that way.

When the data is selected, the system will index it according to the specified key. SORTs and BINARY-SEARCH are not required in this case.

Below is an example of a read statement on the hashed table. One thing to keep in mind is to use WITH TABLE KEY for reading hashed tables, as opposed to WITH KEY for standard tables

= RESULT_FIELDSfield1 ASSIGNING <f_s_itab1>

Field Symbols Instead of Work Areas

A field symbol acts as a pointer to a record in an internal table, whereas a work area actually holds the value from an internal table.

So if we do a loop on an internal table with 100 records, the field symbol will store the position of each record through each pass of the loop, and allow us to modify that internal table, whereas the work area will actually store the record that was looped.

What we’ve noticed is that field symbols provide better performance when having to loop or read through internal tables.

To declare a field symbol, you first need to have a type or structure declared or available:

TYPES: BEGIN OF t_itab1,
        field2 TYPE /BI0/OIFIELD2,
END OF t_itab1.

FIELD-SYMBOLS: <f_s_itab1> type t_itab1.

When performing a read you will use the ASSIGNING command:

= RESULT_FIELDSfield1 ASSIGNING <f_s_itab1>

For validating if the field symbol is assigned:

IF <f_s_itab1> IS ASSIGNED.

Write your logic

Don’t forget to unassign the field symbol after your logic is complete:

             UNASSIGN <f_s_itab1>.

And close your IF statement:


Looping is similar to a read, as you also have to use the ASSIGNING command:


The main difference is that you do NOT need to unassign the field symbol. At each iteration of the loop, it will unassign and reassign to the next record in the internal table.

However, if you do wish to reutilize the field symbol after your ENDLOOP, you should immediately unassign it to prevent incorrect records being pointed to:

   Write your logic

UNASSIGN <source_fields>

Parallel Cursor

Inevitably when writing ABAP code, we will stumble across a scenario where we need to write a loop within a loop. That is a big no-no in terms of ABAP programming best practices. There’s a nifty little trick that’s called a parallel cursor.

Here’s how it works:

1.       If you’re not using hashed tables, make sure to sort itab1 and itab2;

2.       Start the first loop into itab1 and assign field symbol <fs1>;

3.       Within your loop on itab1, we first do a READ into itab2, assigning field symbol <fs2>, to determine the exact location of the record that is required;

4.       If a record is found and <fs2> is assigned, we then save the position of the record (sy-tabix) to our variable lv_index;

5.       We then unassign <fs2> so we can start the loop with parallel cursor

6.       We start the loop into itab2 from that start position lv_index;

7.       After assigning <fs2> we do a check to validate if the field1 we’re selecting from the 2nd loop matches the record in <fs1>.

     a.       If it does, we carry on with our logic;

     b.       If it doesn’t, we exit, and now we will move on to the next record in itab1

Below is an example of the parallel cursor:

LOOP AT itab1 ASSIGNING <fs1>.

READ TABLE itab2 WITH KEY field1 = <fs1>-field1       



     IF <fs2> IS ASSIGNED.

            lv_index = sytabix.

            UNASSIGN <fs2>.


        LOOP AT itab2 ASSIGNING <fs2> FROM lv_index.

           IF <fs1>field1 <> <fs2>field1

   write your code



And with comments:

LOOP AT itab1 ASSIGNING <fs1>. Our first loop

   READ TABLE itab2 WITH KEY field1 = <fs1> ASSIGNING <fs2> BINARY SEARCH. Reading our 2nd internal table we wish to loop into to determine the start position of the second loop

     IF <fs2> IS ASSIGNED. If a value is found, it will be assigned and pass this check

        lv_index = sytabix. Store the position of the found record in itab2
       UNASSIGN <fs2>. Clear the field symbol

      LOOP AT itab2 ASSIGNING <fs2> FROM lv_index. Start the loop at the position we found above on itab2
      IF <fs1>field1 <> <fs2>field1 if we’ve now looped through itab2 and it no longer matches the record in <fs1>, it’s time to move to the next record in the loop for itab1, so we exit the 2nd loop on itab2
        EXIT. Exits the loop on itab2
              Write your logic for when <fs1>-field1 = <fs2>-field1

     ENDLOOP. Endloop for itab2

ENDLOOP. Endloop for itab1


Hopefully these tips can help you build a more efficient and better performing backend SAP BW data warehouse. Your feedback and suggestions are always welcome, and if you have better or different ways of doing the same thing, I’d definitely be interested in learning them.

Best of luck on your developments!

To report this post you need to login first.


You must be Logged on to comment or reply to a post.

  1. Ethan Jewett

    Great list! I heartily agree with most of it.

    The only major issue is that the section on “Master Data Lookup in Transformation” is not correct. It is a common misconception, but unless you are on BW 3.x or a very early version of 7.0, the master data lookup for the standard rule occurs into an internal table for the full data package. It is the same as the example in this document, except that the standard code will usually be faster because it uses a sorted table and a read with binary search.

    Hashed tables also have some significant drawbacks over standard/sorted tables, IIRC. Specifically, if you have 2 records with the same key you get silent, undefined behavior (in a standard table it will work fine and in a sorted table you’ll get an exception).



    1. Marcelo Berger Post author

      Hi Ethan,

      Thanks for your comment and for clarifying my misconception.

      Even though technically SAP is doing what I described (good to know they’re following that general process), we found that the performance was still better when manually doing the select in the start routine, instead of using the standard SAP functionality. We were on BW 7.3 on HANA. And that was truly just through trial and error with both methods.

      So it seems that there’s still something out there in the standard code that’s causing performance not to be ideal for these scenarios.

      Perhaps in BW 7.4 (which I don’t have much experience on yet) where plenty of the functionality is being pushed down to the HANA engine, this difference in performance might be corrected, and/or even negligible since the improvement of using the HANA engine will be significant.

      Regarding hashed tables, I completely agree with your comments. Thanks for providing additional context around them.

      Best regards and thanks,


      1. Ethan Jewett

        Hi Marcelo,

        I’m surprised that you’re getting better performance with the custom routine, as I’d expect you to get slightly worse performance. But regardless, the assertion in the blog that the standard routine is doing 50K selects for a master data lookup on a 50K-record package is not correct. If your system is configured correctly it will just do one select for the package. It’s certainly possible that it’s not configured correctly, in which case it might be worth your time to read the discussion and notes referenced in the linked discussion, as there are some configuration parameters that may need to be adjusted: Performance: reading huge amount of master data in end routine

        A note on my context: The reason I harp on this particular issue so often is that using a custom routine rather than standard functionality in a transformation significantly increases maintenance costs for customers and obscures data-flow semantics, making it so that a customer can’t take advantage of future improvements. The push-down in BW 7.4 on HANA is a perfect example. A transformation with standard master data lookup will get pushed down to HANA, but a transformation with *any* ABAP routine in it will not get pushed down. Another example is customers that might want to use Information Steward or another tool to track data lineage.



  2. Michael Cockbill

    Nice article thanks Marcelo, very clear and easy to read. 

    Another couple of useful hints are :

    1) Adding secondary indexes to Master Data objects (this is new in 7.3) – useful for improving lookup performance on large MD objects, whichever method you choose (Routine, MD Attribute lookup, or Start/End routine.

    Secondary and Extension Index on InfoObject Master Data Tables in BI 7.3 – Netweaver Technology – SCN Wiki

    2) Building on your suggestion to increase the amount of background processes, rather than just simply using SM51 to see the total amount of processes available on the system – you can use program /SSA/BWC (via SE38) to get a really informative view of what’s been happening in your system like Idle Time%, Num. Background jobs running at any given time, how many process are free at any given time and more.  This can be really helpful in improving utilisation and tuning jobs with regard to timing and how many processes to allocate.  Sample of the output included here : (idle time listed across the top, background processes down left hand side, number of jobs running along the bottom)





Leave a Reply