SAP HANA: Implementing RANK using Graphical calculation View
Hi Folks,
As mentioned in this blog:
As requested in the above blog, collaborating with this document on the steps i have used to create the graphical view . Please do suggest your opinions to make it better.
Will explain the each node here on how we can build the graphical calculation view and get rank in it.
Projection 1 & 2 : We select the required columns i.e Employee Name,Region and Salary and create a “dummy” column with constant value 1 as shown below:
Join 1: ( Projection 1 & 2 joined on dummy column )
Aggregation 2 : ( Aggregate salary as “Minumum” and add “dummy” column to apply the group by )
Join 2 : ( Joining Aggregation 2 and Projection 1 on dummy column )
Projection 3 : ( Apply filter SALARY_1 < SALARY )
Aggregation 1: ( Add EMPLOYEE_NAME_1 as “Aggregated column” and apply Count on it at EMPLOYEE NAME,REGION,SALARY level)
Projection 4: ( Add a calculated column Rank with formula RANK1 + 1 — To get the ranks from 2)
Projection 6: ( Apply filter SALARY = MINUMUM and add it to Rank = 1 as calculated column )
Projection 7 & 5 : Choose the required columns i.e EMPLOYEE_NAME,REGION,SALARY and RANK
UNION_1 ( Map both Projection 7 & 5 in the union )
Now you can see in the data preview we can see the RANK column showing us the desired output.
Hope this navigations is clear.
Your’s
Krishna Tangudu
Krishna,
The logic is too cool 🙂 . how about the performance aspect (join2 & aggregation 1)
Thanks for the share.
Sree
Hi Sreehari,
Yep will have to check the performance here. Once I load considerable amount of data then will check and post the results here
Regards,
Krishna Tangudu
Hi Krishna,
Thanks for sharing.
Thanks Sathish.
Regards,
Krishna Tangudu
This looks like a complicated solution. If the goal is just to get the rank as output, I would suggest using window functions in SQL. You could embed this in a graphical calc view just the same and achieve the same result with 2 lines of code.
SELECT "EMPLOYEE_NAME", "REGION", ROW_NUMBER() OVER (PARTITION BY "EMPLOYEE_NAME" ORDER BY "SALARY") AS "RANK", "SALARY"
FROM EMPLOYEE
Just satisfied a much more complex ranking requirement and landed on using this technique, much simpler and clear.
If you need to pre-aggregate and then rank, you just perform a subquery and wrap this partition function around that. This would be the case if you had an employee across two regions and you wanted to rank wholistically.
Regards,
Justin
Yes Justin, Completely agree with you. I mentioned the same in the previous blog as well
SAP HANA: Dynamic Ranking using Script based (SQL Script) Vs Graphical Calculation view Vs Script based (CE Functions)
As people were asking for detailed screens on the graphical, so updated in this blog.
As window functions work on row , was thinking of multiple ways to do the same and see if we can get any performance benefit if we are using graphical.
I will have to still validate with more volumes of data. But my gut feeling says that I eventually have to agree with you.
Regards,
Krishna Tangudu
Hi Krishna,
Any update on the performance for this approach?
We have the same scenario and thinking if the graphical version will be faster over the SQL one.
Thanks,
Remya
In this case I would expect SQL to be faster because it creates a single SQL execution plan in the column store.
Some empirical evidence either way would be interesting.
Hi John,
Even I had the same thought, but I got this note from SAP:
Window functions (RANK()) in HANA at the moment are not nice, since they have to be calculated in the row engine and are not natively supported in the column store. Try to rebuild the ranking with features which are running in column store.
@Krishna,
Please share some test results on these scenarios if you have.
Thanks,
Remya
Hi Krishna,
Thanks to your effort on this! This is really helpful to me!
BR,
Hyukjoo
Thank you for sharing. Very helpful!
Thanks for sharing.
Same scenario I have implemented using script where 5000 records is taking 30 sec.
Hi,
Nice solution. Just small rectification in Aggregation 2 section, Aggregation Type should be MAX for MINIMUM column. Otherwise, you will not get data for RANK 1 and you will skip it.
Regards,
Vikram