Implementing Join Ranks in Data Services
This article is a continuation of my previous article which can be accessed here.
Below is a real case scenario where I implemented the Join rank recently for one of my data migration project. Using join rank had significantly improved the performance, 5 times faster.
I have loaded Good Receipt data in SAP and wanted to compare the loaded data with the transformed data what was attempted to load. This is to track what has been loaded successfully in SAP and what failed to load. In order to do that I need to join the SAP table with my transformed table and extract the matching records. So, for this scenario, I will be using Join Rank and explaining below step by step process.
Steps to implement Join Rank in SAP Data Services.
Create a data flow using following transforms which uses two tables as a source, one SAP table and another transformed table. Used inner join with key fields required. Target table will hold to matching rows from both tables.
Open MSEG Table (double click the table) and change the Join Rank to 30 (By default join rank is 0). Highlighted in yellow. This table has approx. 3.9 million rows.
Open INR_GR_ENR Table (Transformed table) and changed the Joni Rank to 10. This table has 53,500 rows.
Table: MSEG rows count from SAP
You can read in detail how Join Ranks work in Data Services in my previous article here.
MSEG is one of the biggest SAP table with 3.9 million rows. My transformed table (i.e. INT_GR_ENR) has 53,500 rows to load in SAP. Let’s assume that 53,000 rows were loaded successfully where 500 Good Receipts failed to load. So, when I join these two tables it will return me matching 53,000 rows.
What happened when Join Rank not used?
If I simply use inner join on these two tables and run the Job without using any join ranks it takes 10 minutes to produce the result.
Job Execution Time: 10 minutes
What happened when Join Rank is used?
So, since full push down is not possible in this case, I used a join rank of 30 on my biggest table i.e. MSEG and join rank of 10 on the second table which is smaller in size. Since MSEG has higher join rank, it will be a driving table. Now the result has significantly improved and I got the Job completed within 2 minutes which is 5 times faster than the previous run.
Job Execution Time: 2 minutes
It is that simple to implement. Let me know if that helps.