Functioning of gen_row_num_by_group function in BODS
I was using gen_row_num_by_group function provided by Data Services in my project and after experimenting on it I recorded my observations and thought of sharing with you all.
Suppose you get a requirement to find the rank. Then BODS provides a function called gen_row_num_by_group which allots ranking to incoming rows.
But be wary of one thing when you use this function and that is, in layman language, all your incoming records should be properly arranged and in technical terms the column for which you’ll apply ranking should be ordered by first.
Let’s understand it by an example.
Consider my below source and target,
Mapping of Source to target is as follows:
First time when job is executed then records of target table are:
Now before jumping to gen_row_num_by_group let’s have a look at gen_row_num function.
GEN_ROW_NUM: – As the name suggests Generate Row Number so it assigns number to every incoming row each time when you execute your dataflow residing inside your job. By default it starts to increment the incoming records by 1 but if you map it like,
Gen_row_num() + 1 then for every incoming row it’ll add this ‘1’.
In general, it can be
Let’s understand with an example,
I’ve added Result column and mapped it with gen_row_num() function as shown below:
So after executing the job my output looks like:
See as I mentioned it added a sequence number for every incoming record.
Now let’s understand the functionality of gen_row_num_by_group.
Gen_Row_Num_By_Group: – This means make a group of similar records and then apply gen_row_num functionality. ( Ha ha this definition is created by myself. :))
Now as I said it’ll make a group for a specified column and then will assign row number to them.
I’ve mapped my result column with gen_row_num_by_group as shown:
After executing below is my target table: –
So logically what it did?
For a group of similar incoming records it assigned row numbers.
Like for incoming records from 101 – 104 the column C2 had values ‘A’ so it assigned 1,2,3,4 to it.
Then from 105 the value of column C2 changed, i.e., it became AB, so it assigned it 1, record 106 also had the same value for C2 so it assigned it to number 2.
In a similar fashion it did for all incoming rows.
So ideally our function has assigned rank to the similar group but wait, check value of 109, it is having value ‘1’ but it should’ve value ‘5’ right as per the functionality of gen_row_num_by_group.
So it means I need to arrange my records first orderly.
So I apply order by as shown for the column which I want to assign the rank:
Again I execute my job and the output looks like:
But all in vain, again for C1 = 109 I’ve rank ‘1’ against it.
So this function,
Firstly applied gen_row_num_by_group functionality to all the incoming source records then applied order by and arranged the records in an order fashion.
See C2 which has value ‘A’ forms a group, ‘AB’ forms a group.
Similarly all similar records in C2 form a group.
By group I mean they are arranged orderly.
So to resolve this issue and to apply proper ranking to my incoming source I first need to arrange the incoming records in an ordered fashion and then apply the gen_row_num_by_group functionality.
So I modify my mapping as shown:
First order by,
Then gen_row_num_by_group functionality,
Finally after executing the job my target table looks like:
So you can see that all similar records of C2 are assigned a proper rank.
How it processed and assigned rank to the incoming data?
Firstly it applied order by functionality to all the incoming records and arranged them in a similar group and then it applied ranking to it.
Hope I’m able to explain the functionality of gen_row_num_by_group and the way it processes the incoming data.
Please let me know if I’ve missed anything.