I have been working with databases for ages, and always thought they had little limitations, except for the possibilities to calculate across rows. In basic SQL it’s not possible to refer to values in other rows. This make some calculations very hard or even impossible.
Working now a lot with the SAP HANA database, I learned about window functions which really opened a lot of new possibilities.
You can regard a window function as an in-line aggregation. You will get the results of the aggregation function on each line. Some simple examples based on the table below show the idea and the syntax of a window function:
Let’s use a window function now to sum the total revenue of each customer. Here we use the well-known SUM() function and specify the aggregation level with the ‘over(partition by …)’ extension:
select "Customer", "Period", "Revenue", sum("Revenue") over (partition by "Customer") as "TotalCustomerRevenue" from "NSLTECH"."CustomerPeriodRevenue"
Which results in the following:
This would be possible without window functions by running a subquery which does the aggregation on customer level and join it to the original table.
If we add an ‘order by’ clause, we will actually get a running sum over the periods
select "Customer", "Period", "Revenue", sum("Revenue") over (partition by "Customer" order by "Period") as "TotalCustomerRevenue" from "NSLTECH"."CustomerPeriodRevenue"
For more specifics about the syntax and all the possibilities, check the documentation.
Calculating Delivery Block Duration
A common question from business is to analyze the time a delivery block (or any other) has been active. This is a nice example which we can solve with the window functions LAG. The LAG function returns the value of a specific field of the previous row in the partition.
Let’s look at some example change documents regarding delivery blocks in the CDPOS/CDHDR table of SAP:
Here you see that one document has been blocked and unblocked twice with the same code (07). The records where VALUE_OLD has a value and VALUE_NEW is empty are the moments the blocks are removed. If we take these records as the basis we would like to join the corresponding records at which the block was set.
However, this is not easily done with a subquery as you can’t just look at similar keys and block values because in this case the document has been blocked twice. You actually need to find the closest to the unset. This is where the window function LAG comes in.
First we add a couple of helper columns to the raw data:
- ChangeDate: to_seconddate(concat(UDATE, UTIME), ‘YYYYMMDDHH24MISS’)
- BlockCode: case VALUE_OLD when ” then p.VALUE_NEW else p.VALUE_OLD end
- BlockChange: case VALUE_OLD when ” then ‘Block’ else ‘Unblock’ end
Based on this input we calculate the previous ChangeDate for all records using the LAG function:
LAG(“ChangeDate”) over (partition by TABKEY, TABNAME, FNAME, “BlockCode” order by “ChangeDate”) As “PreviousDate”
The complete query:
select TABKEY, "BlockCode", "BlockChange", "ChangeDate", VALUE_OLD, VALUE_NEW, LAG("ChangeDate") over (partition by TABKEY, TABNAME, FNAME, "BlockCode" order by "ChangeDate") As "PreviousDate" from ( select p.MANDANT, p.CHANGENR, p.TABKEY, p.TABNAME, p.FNAME, h.UDATE, h.UTIME, p.VALUE_OLD, p.VALUE_NEW, case p.VALUE_OLD when '' then 'Block' else 'Unblock' end As "BlockChange", case p.VALUE_OLD when '' then p.VALUE_NEW else p.VALUE_OLD end As "BlockCode", to_seconddate(concat(UDATE, UTIME), 'YYYYMMDDHH24MISS') As "ChangeDate" from SAP.CDPOS p inner join SAP.CDHDR h ON p.MANDANT = h.MANDANT and p.OBJECTCLAS = h.OBJECTCLAS and p.OBJECTID = h.OBJECTID and p.CHANGENR = h.CHANGENR where fname= 'LIFSK' )
Which now results in
If you select only the ‘Block’ records from this results and calculate the difference between the ChangeDate and the PreviousDate you will get the duration of the block.