SAP HANA: Achieving Set operations using Joins
Hi Guys,
Jody Hesch has written an excellent writeup Thinking in HANA – Part 1: Set Operators on how to achieve SET Operations using Calculation Views with UNION node. In this document i intended to show the set operations using JOINS. Thanks Jody for a wonderful document, will be waiting for your Part -2
Now we can have multiple discussions. As we know that achieving with UNION is better than JOINS. But again it depends on the volumes of the data and the place at which you are applying it.
As per our experiences, we also know that if we can achieve with the help of Attribute /Analytic Views it is better than doing the same with Calculation View Graphical / SQL. In this document i have tried to achieve with Attribute views as much as possible and while doing this i came to know about the Full Outer Join option through graphical models is not possible anymore.
These are 7 questions on which Jody has discussed,
Here are our Tables:
Table 1:
Table 2:
Revision: 68
1) What are all the elements in both data sets?
SQL Solution: (Similar to UNION)
SELECT
COALESCE(T1."ID",T2."ID") AS "ID",
COALESCE(T1."NAME",T2."NAME") AS "NAME",
COALESCE(T1."LAST_NAME",T2."LAST_NAME") AS "LAST_NAME",
COALESCE(T1."ROLE_ID",T2."ROLE_ID")AS "ROLE_ID"
FROM "EMPLOYEE" T1 FULL OUTER JOIN "NEW_EMPLOYEE" T2 ON T1.ID = T2.ID ;
Result:
Note: Full outer join option not coming for any of the Graphical HANA models.
2) Which elements are exclusive to SET 1?
SQL Solution: (MINUS/EXCEPT)
SELECT
T1."ID",
T1."NAME",
T1."LAST_NAME",
T1."ROLE_ID"
from "EMPLOYEE" T1 LEFT OUTER JOIN "NEW_EMPLOYEE" T2 ON T1.ID = T2.ID
WHERE T2.ID IS NULL;
HANA Models:
1) Create a Attribute view with Left outer join on both the tables
2) Consume the Attribute view in the Graphical calculation view to apply filter “T2.ID” is NULL after the joining takes place. ( You cannot apply filter after joining using attribute View).
Result:
3) Which elements are exclusive to SET 2?
Similar to above mentioned solution, you need to select the fields from Table 2 and filter on T1.ID IS NULL.
4) Which elements are in both sets?
SQL Solution: (INTERSECTION)
SELECT
T1."ID",
T1."NAME",
T1."LAST_NAME",
T1."ROLE_ID"
FROM "EMPLOYEE" T1 INNER JOIN "NEW_EMPLOYEE" T2 ON T1.ID = T2.ID;
Attribute View Solution:
Result:
5) Which elements are in SET 1 (non-exclusive)?
SQL Solution:
SELECT
T1."ID",
T1."NAME",
T1."LAST_NAME",
T1."ROLE_ID"
FROM "EMPLOYEE" T1 LEFT OUTER JOIN "NEW_EMPLOYEE" T2 ON T1.ID = T2.ID;
Attribute View Solution:
Result:
6) Which elements are in SET 2 (non-exclusive)?
SQL Solution:
SELECT
T2."ID",
T2."NAME",
T2."LAST_NAME",
T2."ROLE_ID"
FROM "EMPLOYEE" T1 RIGHT OUTER JOIN "NEW_EMPLOYEE" T2 ON T1.ID = T2.ID;
Attribute View Solution:
Result:
7) Which elements are in SET 1 or SET 2?
SQL Solution: (Similar to UNION Without duplicates)
SELECT
COALESCE(T1."ID",T2."ID") AS "ID",
COALESCE(T1."NAME",T2."NAME") AS "NAME",
COALESCE(T1."LAST_NAME",T2."LAST_NAME") AS "LAST_NAME",
COALESCE(T1."ROLE_ID",T2."ROLE_ID")AS "ROLE_ID"
FROM "EMPLOYEE" T1 FULL OUTER JOIN "NEW_EMPLOYEE" T2 ON T1.ID = T2.ID
WHERE T1.ID IS NULL OR T2.ID IS NULL ;
Result:
Note: Full outer join option not coming for any of the Graphical HANA models.
As i have just shown the alternate ways to achieve it, i’d assume there must be other ways too and decision is left to you on when to choose which kind of approach based on performance.
Hoping you guys help in colloborating with this document to understand the best approach w.r.t performance. And correct me in tuning this approach.
Yours
Krishna Tangudu
Excellent blog Krishna
Thank you very much mahesh 🙂
Krishna, interesting take on the same questions from Jody.
To come to some definitive conclusions to the performance aspects, it might be helpful to run some detailed testing and publish the results back out.
Take a similar approach with the scenarios 1- 7, but explode out the volumes in the underlying tables. Then measure runtime for each of the following
1. SQL approach using join (as you show)
2. SQL approach using set operators (as Jody Hesch provided)
3. Modeling approach with join (attribute view as you show)
4. Modeling approach with set operators (CV as Jody has shown)
I would find this an interesting study into the performance aspects of both techniques.
Regards,
Justin
Thanks justin for your inputs.
Yes i will see if i can get some data simulated or i will check if i can do this in the ECC tables itself and will dfinately publish the results once i have with the approach i followed here.
Regards,
Krishna Tangudu
Hi Krishna,
Thanks for collaborating the document in detail. this document is very useful for SAP HANA users. Keep updating SAP HANA details. All the best.
Regards,
Hari Suseelan
Thank you Hari for your feedback
Regards,
Krishna Tangudu
Krishna,
Good effort in taking set operators forward. I completely agree with Justin. Do the comparison for both the approaches for the final conclusion of better performance.
Keep up the good work.
Regards
Raj
Thank you Raj for your kind words 🙂 . Yes i agree along with you to understand and agree to an approach w.r.t performance.
Regards,
Krishna Tangudu
Thanks Krishna, for sharing
Welcome KD 🙂
Wonderful document Krishna. And thank for pointing us to Jody Hesch 's document. That was good read too. I am still working on the dynamic joins from your other document "Using Dynamic Joins in Clac Views". Once I get a hang on that, which is going to take a long time 🙂 , I will try this one out. I am kinda slow with understanding these sql, joins, cardinality concepts.
Thanks,
Benedict
Welcome Benedict 🙂 Nice to know that you liked this document and that other one too.
Regards,
Krishna Tangudu
Good Approach Krishna thanks for sharing such a valuable knowledge.
Thanks Hari 🙂
Hi Krishna,
Thanks to contribution to community. Let me complement with something I visualize here.
When we are thinking in sets is normally a matter of take from T1 and/or T2 playing with exists/not exists from one or other side.
On option 1 and 7, using the COASLECE which get the first not null, may produce a "new row" that doesn't exist on T1 or T2 if the column is null.
Example: Imagine if the on option 1 if ROLE_ID of ID 1 is null on T1, they will take ID, NAME and LAST_NAME from T1 and ROLE_ID from T2.
Obs 1: It should not happens for SAP replicated databases or SAP on HANA as there's no nulls on our dataset by default.
Obs 2: There are use cases that what is needed is just this "new row"
Regards, Fernando Da Rós
Good work Krishna,
Q: For 5 & 6, why use left or right join at all? What is the benefit?
Also, for Full-out-join, if one wants to do it using Graphical interface, I wonder if the following would work ..
Step 1: Left-Outer-join
Step 2: Right-Outer-join
Step 3: Union the results of these two.
Best wishes,
Venu
Hi Pan,
Yes for full outer join it would work. Jody might have tried it in the blog I linked in this document at the starting.
You need not do any left outer or right outer join before union. Using UNION node in graphical should work.
For 5 & 6 , I m just comparing against the approach used by jody , no special benefits 🙂
Regards,
Krishna Tangudu
Hi Krishna,
Regarding point2 "Which elements are exclusive to SET 1?"
why to create joins in attribute views and consume in calculation views? instead we can join tables in calculation view itself and apply filters in projection node right?
Will there be any difference? or is it a best strategy,
Can you explain?
Regards,
mk21612