Assigning same Rank to similar values of a column
Hello SAPians,
From past many days I did not logged in to check my sap blogs account and today when I logged in, there was a comment with a question on my below post: https://blogs.sap.com/2017/08/16/functioning-of-gen_row_num_by_group-function-in-bods/#
“How can we get same ranking to similar values of a column?”
Immediately after reading the question I designed the data accordingly and started my experimentation. Thanks to Sobhan Annepu for this question. 🙂
Now let’s start with a scenario, gen_row_num_by_group function does lets you to get different ranking to similar values but what if client asks you to get same rank to similar values?
Let me clear this with an example. Consider the below data:
Col2 with similar values should have same rank i.e., for ‘RAJAN’ it should be 1, for ‘RAJANEY’ it should be 2, for ‘RAJ’ it should be 3 and so on. This is similar to dense_rank function in Oracle.
Now the design goes like this:
The script contains initialization of variables:
Dataflow design is:
Mapping of Query transform is:
Variable declaration:
Now comes the code of function that does this process:
This custom function designed below assigns the ranking to the similar values:
## $p_prev_string (i/o parameter) will contain the previous value ###
## $p_curr_string (output parameter) will contain the current value entering function###
## $p_result (i/o parameter) gives the rank to similar values ###
print( 'String entering the function is: '|| $p_curr_string );
print( 'Previous String entering the function is: '|| $p_prev_string );
print( 'Start Value of rank is: '|| $p_result );
if ( $p_prev_string is null or $p_prev_string = $p_curr_string )
begin
### This is executed if current and previous values are same ###
$p_prev_string = $p_curr_string;
print( 'Rank is: '|| $p_result );
end
else
begin
$p_prev_string = $p_curr_string;
print( 'Value of previous string becomes: '|| $p_prev_string );
$p_result = $p_result + 1;
print( 'Rank is: '|| $p_result );
### This is executed if current and previous values are different ###
## This part will be executed only once ####
end
Return $p_result;
Let me try to explain the working of this function:
Firstly it’ll take the first value of column (Col1) in $p_curr_string, the $p_prev_string will be assigned null at the script and the $p_result has value 1. When dataflow gets processed then value ‘RAJAN’ goes to $p_curr_string through parameter $p_curr_string, this parameter will be input parameter i.e., it’ll only take the values of column. $p_result will have value ‘1’ and this result will then be passed to the Result column.
When software encounters second row having Col1 value ‘RAJAN’ then this value gets assigned to $p_curr_string and the previous value that was present inside this variable $p_curr_string gets assigned to $p_prev_string, this $p_prev_string is input/output parameter as this value needs to be updated after every row. $p_result will now have value ‘1’ and this result will then be passed to the Result column (for second row).
When $p_curr_string gets value ‘RAJANEY’ then it goes to else part and result value, $p_result, gets updated and it becomes ‘2’. Also the value of $p_curr_string gets assigned to $p_prev_string
“Else part gets executed only when the values of Col1 change”
Explaining working of custom function is quiet difficult but still I’ve tried explaining it in the best way.
Hence output is:
Hope it helps!
Please correct if I’ve missed anything.
Thanks 🙂
Nice blog! It has great examples of how to do something. It also gives credit to the person who answered your question and links to it. I love it!
Michelle
Thanks Michelle. 🙂 I was thinking to write something but wasn't finding any topic but when I saw that question I decided to make a scenario out of it and explain it.
I agree. Even though it's outside of my area of expertise, this kind of content is exactly what we need more of on SCN instead of generic articles about "business trends".
Well done, Rajan Burad
Thank you, Jelena. 🙂 I feel it is the best platform to share your knowledge with others and even if you've written something wrong there are a number of experts to correct you. 🙂 Thanks Again!
I hate to be the party pooper here. But there's a much more elegant solution to this problem, just using default DS functionality, without the need for a custom function, variables or parameters.
Build a data flow like this:
When dealing with large data sets, it's always better to load file input into a staging area. The distinct operation (involves a sort) will be pushed to the underlying database. Adding a Data_Transfer transform will allow to push the join, too. Try both solutions with a 1M records data set. You'll be surprised by the performance gains:
Thanks Dirk for enhancing the knowledge. Yes you're right, if we've data in millions then it'll hit the performance.
I tried with your solution as well and for data in millions it works faster.
It's nice to see that you are able to make some time out, with such detailed explanation.
Good going..keep it up!!!
Thanks Shashank for mentoring me 🙂
Hi Rajan and Dirk,
Really it was good brainstorming discussion and thanks for your effort on this.
Ravi.P
Welcome Ravi. 🙂
Indeed, Thanks Rajan and Dirk for being able to make some time out, with such detailed explanation. This and other articles has helped me multiple times to come-up with a good ETL solutions. Thanks and keep up the good work.