Skip to Content

Buffering internal tables in BW Transformation

Applies to:

SAP BW/BI 7.x. For More Information visit the Business Intelligence homepage.


Summary

The document outlined below will describe how we can achieve data load performance improvement in SAP BW transformations which involves database lookups in start routine or end routine.   Basically I will try and explain the transformation class structure, how we can make use of this structure to reduce the number of database hits during lookups in start and end routines while processing numerous data packages.

Author: Vijay Kumar Nagaraj

Company: Deloitte Consulting

Created On: 9th Dec 2014

Author Bio

Vijay Kumar Nagaraj is a Certified SAP BW consultant with 6+ years of experience in SAP

BW/BPC and SAP BI ABAP.




Overview

The transformation process allows you to consolidate, cleanse, and integrate data.   When you load data from one BI object into a further BI object, the data is passed through a transformation. A transformation converts the fields of the source into the format of the target. 

Transformation allows you to perform lookups on master data or transaction data to fill in additional information while transforming the data into the target.  During lookups we generally use Select command which will access the database server to fetch the records into the Application Server.  This Select command can be performance intensive if the volume of records to be fetched from database server is huge and there are numerous data packages waiting to be loaded into the target. Hence, 

Number of database hits using select = Number of data packages loaded into the target

Just imagine the number of times we are hitting database server if we have thousands of data packages to be loaded to the target.



Buffering image 1.png



Our main focus in this document is to show you how we can reduce the number of data base hits, by buffering/caching the data records fetched from the first database fetch and store them in an global internal table and use the same internal table for rest of the data packages thereby achieving significant performance gains in terms of system resource usage and data load time.  Let’s get started.




Class Structure in Transformation

Focus  on the yellow higlighted lines.

CLASS lcl_transform DEFINITION.

PRIVATE section

*$*$ begin of global – insert your declaration only below this line*-*
    CLASS-DATA: it_GLOBAL1 TYPE STANDARD TABLE OF SFLIGHT.
*$*$ end of global – insert your declaration only before this line*-* METHODS
start_routine
IMPORTING
request
type rsrequest
datapackid
type rsdatapid
CHANGING
SOURCE_PACKAGE
type _ty_t_SC_1

  1. ENDCLASS.                    “Routine DEFINITION

*$*$ begin of 2nd part global – insert your code only below this line  *
DATA: it_GLOBAL2 TYPE STANDARD TABLE OF SFLIGHT.
*$*$ end of 2nd part global – insert your code only before this line   *

CLASS lcl_transform IMPLEMENTATION.

METHOD start_routine.
*=== Segments ===

FIELD-SYMBOLS:
<SOURCE_FIELDS>   
TYPE _ty_s_SC_1.

DATA:
MONITOR_REC    
TYPE rstmonitor.

Your start routine code goes here.

*$*$ end of routine – insert your code only before this line         *-*
ENDMETHOD.                    “start_routine

  1. ENDCLASS.                   


The above class structure is shown for a start routine, similar structure applies to end routine and expert routine as well except for the method name and parameters will differ.

As you can observe there are two global declaration sections, first global section is present within the class definition and the 2nd global section is present outside the definition of the class.  Both the sections can be used to declare global variables/internal tables which needs to be shared across start routine/end routine/rule routine for all the records within the same data package.  The only difference comes when you want to make the global variable shared across multiple data packages, this can only be achieved by declaring the variable as static variable within the first global section present within the Class. So in our example above it_GLOBAL1 internal table will be shared across multiple data packages of the transformation since it is declared as static by using “CLASS-“ prefix. 

Static variables will be assigned memory space only once and this memory space will remain in scope until the end of DTP load, thereby allowing the static variable to be shared across multiple data packages, in contrast, the instance variables (without CLASS- prefix) will be assigned memory only when the object of the class is instantiated thereby limiting its scope only to that object, this allows the instance variable to be shared only within the Object.

Deep Dive in Data Package flow and buffering

Buffering image 1.png




I am taking an example of how we can buffer an internal table to be used across multiple data packages of a single dtp request.  Let’s assume we have declared a static internal table lt_global within the class and this internal table is filled with all the entries from SFLIGHT table in the start routine and also notice that we are checking if the internal table if it’s already filled before doing the select from the database.  This condition will avoid reading database if the internal table is already filled.

Next step we need to trigger a dtp, and we are setting default number of background processes as 3. Once we trigger the DTP three background processes will be created and each background process will have single object instantiated of type Class lcl_transform.

Each datapackage will be assigned to background process sequentially and the datapackage will be passed to the start routine method as a parameter of the corresponding Object Instance/memory.

Now, lt_GLOBAL internal table will be fill by Select statement only when its empty, it could be filled by any object and this internal table will be retained across all the datapackage processing since this internal table is declared as static and its memory is independent of Object memory.

Now, How can we take advantage of this process? The answer is, whenever we need to look up on the database table in the start routine and very importantly the selection condition shouldn’t be dependent on source package contents like below:

Select * from SFLIGHT

Into lt_GLOBAL

for all entries in SOURCE_PACKAGE[]

where destination = SOURCE_PACKAGE-DESTINATION.

We can perform select only once and the same data can be retained across all the data packages. This will save data load time and system resources by avoiding database operation.

If we have 1000 data packages we need not select database 1000 times, we only have to select once.

References

http://help.sap.com/saphelp_nw70/helpdata/en/e3/732c42be6fde2ce10000000a1550b0/content.htm

  For more information, visit the Business Intelligence homepage.






























To report this post you need to login first.

7 Comments

You must be Logged on to comment or reply to a post.

  1. Raf Boudewijns

    Nice article.

    Just a question: how do you prevent the internal table being filled 3 times in your above example? Assuming the 3 first data packages start in parallel, then the lt_global table is empty for all 3, right? So will it be filled 3 times or not?

    (0) 
    1. vijay nagaraj Post author

      Hey Raf,

      Thanks, The number of times the internal table will be filled is equal to the number of background processes which the DTP will trigger.  I dont think there is an easy way to prevent filling it 3 times.  Let me know if you can find some way to do that.

      Vijay

      (0) 
      1. Alexander Kutz

        It should be possible via memory: Data Clusters in ABAP Memory – BC – ABAP Programming – SAP Library

        … but I’ve never done that in this context. – Question is whether it is really necessary. The database should have the table in memory and it should not take too long to load it. The “is empty” with the global table is really easy to use / a quick goal – and I think it is best practice to use it this way. The data clusters are – I think – too exotic. If your lookup tables are much bigger than the transactional data there must be something wrong.

        (0) 
    2. Christian Hofmann

      Wouldn’t a small “if – then – else” statement solve this issue?

           IF itab is initial.

                     …

                     select … into table itab

                     …

           else.

                     * nothing to load

           endif.

      I don’t see a reason for using a global internal table here.

      (0) 
      1. Raf Boudewijns

        the IF statement is already there… (see last screen shot in the article)

        fact is, if the first 3 (assuming there are only 3) processes start in parallel, the IF statement will give the same result for all 3 (i.e. the internal table will be empty)

        (0) 
      2. vijay nagaraj Post author

        If Statement is already present and it makes sense if we declare the internal table in the global declaration area.  else the internal table is always empty for every single data package.

        (0) 

Leave a Reply