Buffering Internal Tables in BW transformation Routine
Buffering internal tables in BW Transformation
SAP BW/BI 7.x. For More Information visit the Business Intelligence homepage.
The document outlined below will describe how we can achieve data load performance improvement in SAP BW transformations which involves database lookups in start routine or end routine. Basically I will try and explain the transformation class structure, how we can make use of this structure to reduce the number of database hits during lookups in start and end routines while processing numerous data packages.
Author: Vijay Kumar Nagaraj
Company: Deloitte Consulting
Created On: 9th Dec 2014
Vijay Kumar Nagaraj is a Certified SAP BW consultant with 6+ years of experience in SAP
BW/BPC and SAP BI ABAP.
The transformation process allows you to consolidate, cleanse, and integrate data. When you load data from one BI object into a further BI object, the data is passed through a transformation. A transformation converts the fields of the source into the format of the target.
Transformation allows you to perform lookups on master data or transaction data to fill in additional information while transforming the data into the target. During lookups we generally use Select command which will access the database server to fetch the records into the Application Server. This Select command can be performance intensive if the volume of records to be fetched from database server is huge and there are numerous data packages waiting to be loaded into the target. Hence,
Number of database hits using select = Number of data packages loaded into the target
Just imagine the number of times we are hitting database server if we have thousands of data packages to be loaded to the target.
Our main focus in this document is to show you how we can reduce the number of data base hits, by buffering/caching the data records fetched from the first database fetch and store them in an global internal table and use the same internal table for rest of the data packages thereby achieving significant performance gains in terms of system resource usage and data load time. Let’s get started.
Class Structure in Transformation
Focus on the yellow higlighted lines.
CLASS lcl_transform DEFINITION.
*$*$ begin of global – insert your declaration only below this line*-*
CLASS-DATA: it_GLOBAL1 TYPE STANDARD TABLE OF SFLIGHT.
*$*$ end of global – insert your declaration only before this line*-* METHODS
request type rsrequest
datapackid type rsdatapid
SOURCE_PACKAGE type _ty_t_SC_1
- ENDCLASS. “Routine DEFINITION
*$*$ begin of 2nd part global – insert your code only below this line *
DATA: it_GLOBAL2 TYPE STANDARD TABLE OF SFLIGHT.
*$*$ end of 2nd part global – insert your code only before this line *
CLASS lcl_transform IMPLEMENTATION.
*=== Segments ===
<SOURCE_FIELDS> TYPE _ty_s_SC_1.
MONITOR_REC TYPE rstmonitor.
Your start routine code goes here.
*$*$ end of routine – insert your code only before this line *-*
The above class structure is shown for a start routine, similar structure applies to end routine and expert routine as well except for the method name and parameters will differ.
As you can observe there are two global declaration sections, first global section is present within the class definition and the 2nd global section is present outside the definition of the class. Both the sections can be used to declare global variables/internal tables which needs to be shared across start routine/end routine/rule routine for all the records within the same data package. The only difference comes when you want to make the global variable shared across multiple data packages, this can only be achieved by declaring the variable as static variable within the first global section present within the Class. So in our example above it_GLOBAL1 internal table will be shared across multiple data packages of the transformation since it is declared as static by using “CLASS-“ prefix.
Static variables will be assigned memory space only once and this memory space will remain in scope until the end of DTP load, thereby allowing the static variable to be shared across multiple data packages, in contrast, the instance variables (without CLASS- prefix) will be assigned memory only when the object of the class is instantiated thereby limiting its scope only to that object, this allows the instance variable to be shared only within the Object.
I am taking an example of how we can buffer an internal table to be used across multiple data packages of a single dtp request. Let’s assume we have declared a static internal table lt_global within the class and this internal table is filled with all the entries from SFLIGHT table in the start routine and also notice that we are checking if the internal table if it’s already filled before doing the select from the database. This condition will avoid reading database if the internal table is already filled.
Next step we need to trigger a dtp, and we are setting default number of background processes as 3. Once we trigger the DTP three background processes will be created and each background process will have single object instantiated of type Class lcl_transform.
Each datapackage will be assigned to background process sequentially and the datapackage will be passed to the start routine method as a parameter of the corresponding Object Instance/memory.
Now, lt_GLOBAL internal table will be fill by Select statement only when its empty, it could be filled by any object and this internal table will be retained across all the datapackage processing since this internal table is declared as static and its memory is independent of Object memory.
Now, How can we take advantage of this process? The answer is, whenever we need to look up on the database table in the start routine and very importantly the selection condition shouldn’t be dependent on source package contents like below:
Select * from SFLIGHT
for all entries in SOURCE_PACKAGE
where destination = SOURCE_PACKAGE-DESTINATION.
We can perform select only once and the same data can be retained across all the data packages. This will save data load time and system resources by avoiding database operation.
If we have 1000 data packages we need not select database 1000 times, we only have to select once.
For more information, visit the Business Intelligence homepage.