Hi Guys,

Jody Hesch has written an excellent writeup Thinking in HANA – Part 1: Set Operators on how to achieve SET Operations using Calculation Views with UNION node. In this document i intended to show the set operations using JOINS. Thanks Jody for a wonderful document, will be waiting for your Part -2

Now we can have multiple discussions. As we know that achieving with UNION is better than JOINS. But again it depends on the volumes of the data and the place at which you are applying it.

As per our experiences, we also know that if we can achieve with the help of Attribute /Analytic Views it is better than doing the same with Calculation View Graphical / SQL. In this document i have tried to achieve with Attribute views as much as possible and while doing this i came to know about the Full Outer Join option through graphical models is not possible anymore.

These are 7 questions on which Jody has discussed,

Here are our Tables:

Table 1:

Screen Shot 2014-03-03 at 4.07.00 PM.png

Table 2:

Screen Shot 2014-03-03 at 4.08.21 PM.png

Revision: 68

1) What are all the elements in both data sets?

SQL Solution: (Similar to UNION)


SELECT
COALESCE(T1."ID",T2."ID") AS "ID",
COALESCE(T1."NAME",T2."NAME") AS "NAME",
COALESCE(T1."LAST_NAME",T2."LAST_NAME") AS "LAST_NAME",
COALESCE(T1."ROLE_ID",T2."ROLE_ID")AS "ROLE_ID"
FROM "EMPLOYEE" T1 FULL OUTER JOIN "NEW_EMPLOYEE" T2 ON T1.ID = T2.ID ;



Result:

Screen Shot 2014-03-03 at 4.15.36 PM.png

Note: Full outer join option not coming for any of the Graphical HANA models.

2) Which elements are exclusive to SET 1?

SQL Solution: (MINUS/EXCEPT)


SELECT
T1."ID",
T1."NAME",
T1."LAST_NAME",
T1."ROLE_ID"
from "EMPLOYEE" T1 LEFT OUTER JOIN "NEW_EMPLOYEE" T2 ON T1.ID = T2.ID
WHERE T2.ID IS NULL;



HANA Models:


1) Create a Attribute view with Left outer join on both the tables

2) Consume the Attribute view in the Graphical calculation view to apply filter “T2.ID” is NULL after the joining takes place. ( You cannot apply filter after joining using attribute View).


Screen Shot 2014-03-03 at 5.41.22 PM.png

Result:


Screen Shot 2014-03-03 at 4.20.36 PM.png




3) Which elements are exclusive to SET 2?

Similar to above mentioned solution, you need to select the fields from Table 2 and filter on T1.ID IS NULL.


4) Which elements are in both sets?

SQL Solution: (INTERSECTION)


SELECT
T1."ID",
T1."NAME",
T1."LAST_NAME",
T1."ROLE_ID"
FROM "EMPLOYEE" T1 INNER JOIN "NEW_EMPLOYEE" T2 ON T1.ID = T2.ID;



Attribute View Solution:


Screen Shot 2014-03-03 at 5.24.42 PM.png

Result:

Screen Shot 2014-03-03 at 4.21.39 PM.png

5) Which elements are in SET 1 (non-exclusive)?

SQL Solution:


SELECT
T1."ID",
T1."NAME",
T1."LAST_NAME",
T1."ROLE_ID"
FROM "EMPLOYEE" T1 LEFT OUTER JOIN "NEW_EMPLOYEE" T2 ON T1.ID = T2.ID;



Attribute View Solution:


Screen Shot 2014-03-03 at 5.21.22 PM.png


Result:


Screen Shot 2014-03-03 at 4.23.13 PM.png

6) Which elements are in SET 2 (non-exclusive)?

SQL Solution:


SELECT
T2."ID",
T2."NAME",
T2."LAST_NAME",
T2."ROLE_ID"
FROM "EMPLOYEE" T1 RIGHT OUTER JOIN "NEW_EMPLOYEE" T2 ON T1.ID = T2.ID;




Attribute View Solution:


Screen Shot 2014-03-03 at 5.25.31 PM.png


Result:


Screen Shot 2014-03-03 at 4.25.46 PM.png

7) Which elements are in SET 1 or SET 2?

SQL Solution: (Similar to UNION Without duplicates)


SELECT
COALESCE(T1."ID",T2."ID") AS "ID",
COALESCE(T1."NAME",T2."NAME") AS "NAME",
COALESCE(T1."LAST_NAME",T2."LAST_NAME") AS "LAST_NAME",
COALESCE(T1."ROLE_ID",T2."ROLE_ID")AS "ROLE_ID"
FROM "EMPLOYEE" T1 FULL OUTER JOIN "NEW_EMPLOYEE" T2 ON T1.ID = T2.ID
WHERE T1.ID IS NULL OR T2.ID IS NULL ;



Result:


Screen Shot 2014-03-03 at 4.45.19 PM.png


Note: Full outer join option not coming for any of the Graphical HANA models.

As i have just shown the alternate ways to achieve it, i’d assume there must be other ways too and decision is left to you on when to choose which kind of approach based on performance.

Hoping you guys help in colloborating with this document to understand the best approach w.r.t performance. And correct me in tuning this approach.

Yours

Krishna Tangudu


To report this post you need to login first.

18 Comments

You must be Logged on to comment or reply to a post.

  1. Justin Molenaur

    Krishna, interesting take on the same questions from Jody.

    To come to some definitive conclusions to the performance aspects, it might be helpful to run some detailed testing and publish the results back out.

    Take a similar approach with the scenarios 1- 7, but explode out the volumes in the underlying tables. Then measure runtime for each of the following

    1. SQL approach using join (as you show)

    2. SQL approach using set operators (as Jody Hesch provided)

    3. Modeling approach with join (attribute view as you show)

    4. Modeling approach with set operators (CV as Jody has shown)

    I would find this an interesting study into the performance aspects of both techniques.

    Regards,

    Justin

    (0) 
    1. Krishna Tangudu Post author

      Thanks justin for your inputs.

      Yes i will see if i can get some data simulated or i will check if i can do this in the ECC tables itself and will dfinately publish the results once i have with the approach i followed here.

      Regards,

      Krishna Tangudu

      (0) 
  2. Suseelan Hari

    Hi Krishna,

    Thanks for collaborating the document in detail. this document is very useful for SAP HANA users. Keep updating SAP HANA details. All the best.

    Regards,

    Hari Suseelan

    (0) 
  3. Raj Kumar Salla

    Krishna,

    Good effort in taking set operators forward. I completely agree with Justin. Do the comparison for both the approaches for the final conclusion of better performance.

    Keep up the good work.

    Regards

    Raj

    (0) 
    1. Krishna Tangudu Post author

      Thank you Raj for your kind words 🙂 . Yes i agree along with you to understand and agree to an approach w.r.t performance.

      Regards,

      Krishna Tangudu

      (0) 
  4. Benedict Venmani Felix

    Wonderful document Krishna. And thank for pointing us to Jody Hesch ‘s document. That was good read too. I am still working on the dynamic joins from your other document “Using Dynamic Joins in Clac Views”. Once I get a hang on that, which is going to take a long time 🙂 , I will try this one out. I am kinda slow with understanding these sql, joins, cardinality concepts.

    Thanks,

    Benedict

    (0) 
  5. Fernando Da Ros

    Hi Krishna,

    Thanks to contribution to community. Let me complement with something I visualize here.

    When we are thinking in sets is normally a matter of  take from T1 and/or T2 playing with exists/not exists from one or other side.

    On option 1 and 7, using the COASLECE which get the first not null, may produce a “new row” that doesn’t exist on T1 or T2 if the column is null.

    Example: Imagine if the on option 1 if ROLE_ID of ID 1 is null on T1, they will take ID, NAME and LAST_NAME from T1 and ROLE_ID from T2.

    Obs 1: It should not happens for SAP replicated databases or SAP on HANA as there’s no nulls on our dataset by default.

    Obs 2: There are use cases that what is needed is just this “new row”

    Regards, Fernando Da Rós

    (0) 
  6. Pan HP

    Good work Krishna,

    Q: For 5 & 6, why use left or right join at all? What is the benefit?

    Also, for Full-out-join, if one wants to do it using Graphical interface, I wonder if the following would work ..

    Step 1: Left-Outer-join

    Step 2: Right-Outer-join

    Step 3: Union the results of these two.

    Best wishes,

    Venu

    (0) 
    1. Krishna Tangudu Post author

      Hi Pan,

      Yes for full outer join it would work. Jody might have tried it in the blog I linked in this document at the starting.

      You need not do any left outer or right outer join before union. Using UNION node in graphical should work.

      For 5  & 6 , I m just comparing against the approach used by jody , no special benefits 🙂

      Regards,

      Krishna Tangudu

      (0) 
  7. Madhavi kaza

    Hi Krishna,

    Regarding point2 “Which elements are exclusive to SET 1?”

    why to create joins in attribute views and consume in calculation views? instead we can join tables in calculation view itself and apply filters in projection node right?

    Will there be any difference? or is it a best strategy,

    Can you explain?

    Regards,

    mk21612

    (0) 

Leave a Reply