BENEFIT OF “DB-JOIN” IN PLACE OF “FOR ALL ENTRIES IN” IN TRANSFORMATION
Start routine/end routine/expert routine
Data flow feeding daily reports are a typical scenario in a SAP BW world. Based on the reporting requirements, business logic is identified by all stake holders which in turn are configured in the BW data flows to process transactional data. Business Intelligence (BI) has the core capabilities to extract, model and load huge volumes of transaction data and make it available via reports.
In SAP Business Warehouse (SAP BW), we use Transformations to configure business logic across the various layers of the LSA and DTPs to schedule the loads from one data provider to another.
Ø Scenario: Transactional data or Sales data is being loaded (containing large volume of records). During the data load the requirement is to perform a look-up on a different Info Provider (Info Object/DSO/transparent table) to pull-up all the records based on certain keys from the SOURCE_PACKAGE. Such lookups involving large number of records is bound to demonstrate slow performance.
Typical non-optimized approach for modeling this scenario is to code the lookup within a start routine/end routine/expert routine. In the look up, we would use “FOR ALL ENTRIES IN” in the select statement to compare the records from the source of the transformation vs. the lookup Info provider. The statement processes all the records in the SOURCE_PACKAGE and exerts load on the application server. So with the number of fetched records growing, the execution time would also increase proportionally.
SELECT USING ‘FOR ALL ENTRIES IN’:
The major bump in runtime is because of two factors
1. Contribution from I/O latency because of large volume of records transiting across the app server and the DB server
2. Processing time within the app server while reading from the internal table and performing the comparison
To overcome this challenge the approach we take is to push the comparison down to the database and avoid the unnecessary I/O costs. The solution can be summarized under the following steps.
1. Create a Z-table which will hold data only during the lifetime of the transformation.
2. At the beginning of the transformation, load the Z-table with lookup keys from SOURCE_PACKAGE
3. Perform a DB join between the Z-table and the info provider and return the results
Couple of precautions to be taken while taking this approach
· The data package ID should be included as part of the key to avoid data duplication or deadlock scenarios when the transformation executes in parallel mode
· The table data should be cleared during every instance of the transformation execution
1. Create the Z-Table. In our example ZMATERIA for looking-up material master data
2. Use Delivery class as C and make it maintenance allowed with restriction.
3. Select all fields as keys. The fields are nothing but the fields used to perform the lookup. In our example, as we are looking up on material master data, the first field denotes the material ID which we would use in the WHERE clause of the SELECT statement. The second one is the DATAPACKID which will represent the data package number in the DTP load. Save and activate the table
4. Once the table is created, we could use it in the JOIN statement within the transformation start routine/expert routine/end routine.
The above scenario is just a demo to represent the JOIN statement instead of “FOR ALL ENTRIES IN”.
In the real time environment we can see impressive improvement in the performance. We can see this improvement in the above trend analysis in a sample environment.
Thanks for sharing this wonderful technique on improving the lookup performance.
We shall definitely give a try to this approach.
Nice presentation as well.
nice piece of work
Nice document, Surely implement in my Transformation. 🙂
Thank you. Keep posting.