Personal Insights
Multi Join Technique
Dear Friends,
This Blog post is related to Multi Joins technique in HANA view. In Ideal case, if we have more joins in calculation view will be more expensive.
Today we can learn how we can improve performance using Multiple joins in real time scenarios
Requirement:
In a Calculation view we have requirement to join very huge table, since already view also very big this join will be too expensive and causes Performance problems
SCENARIO A:
We have Sales table as source and having multiple joins already available in existing view and we have requirements to get WBS COST from Project table. For Example, Existing data set has 80 Million Data and Project Data set has 40 Million data
In general, will create joins like below
Above join will take more time than expected, since it will execute lookup for each 80 Million number of Records, even though we have only 5 Million number of unique numbers of WBS from left side
What Can be done?
This Performance problem can be resolved with simple Technique, create Aggregation node with unique WBS elements after sales data and join with required right table.
Once you get desired output data and join back to existing view so that we are minimizing data while joining to huge tables.
SCENARIO B,
We have GLPCA Data set with other joins and finally join with MKPF using Material document number.
In this Scenario if unique material document count will be same as source data set count, making unique records will not be helpful
What Can be done?
Partition Source data in to Multiple data set, do join Multiple time instead one time and use Union and get final data Set. This will use parallel processing and performance will be improved
To Summarize, Performance improvement can be done using Multiple join instead single join. Depending data and business requirement we can use any one of above technique
Happy ReadingĀ ?
Always great to see posts about HANA, performance, and ways to make things faster.
Thanks for that.
I would like it much better though if there would be some more details on the different alternative modeling approaches.
For example, the first "trick" to aggregate the data to make it unique is not delivering the same result set as the original one. While that might be wanted it's important to be clear what changes and why.
Here, as well as for the second "trick" I'm also missing numbers that show what went quicker, by how much and why.
You claim that manually partitioning the data before the join and a union afterwards will be faster than a plain join. But under which circumstances is that the case? Depending on the engine used, HANA performs parallel block-wise joins anyhow.
How many records need to be in the table for that to work well? And what happens if one of the table is partitioned? How does this manual partitioning affect memory usage?
All modeling alternatives are trade-offs and we need to see the numbers (PlanViz, expensive statements trade, ...) to understand what those trade-offs are.
Maybe you can include those in your next post.
thanks for valuable suggestions