Technical Articles
SQL Windowing functions
Introduction
New functions have been introduced in HANA for some years ago, but haven’t received a lot of attention. The two functions we will cover here are LAG() and LEAD().
The functions provides access from the current row to the previous and next row in the dataset.
Function Details
Using the LAG() function it is possible to access fields from a previous row from the SELECT statement of the current row. The function must be used in conjunction with OVER() which provides a «window» the table content defined by the «PARTITION BY» clause.
The same are valid for the LEAD() function, which provides access to the next row.
Example
In this example we calculate the number of free seats on the previous and next flight out of Frankfurt.
SELECT
sflight~carrid,
sflight~connid,
sflight~fldate,
sflight~seatsmax,
sflight~seatsocc,
( sflight~seatsmax - sflight~seatsocc ) AS seatsfree,
( LAG( sflight~seatsmax )
OVER( PARTITION BY sflight~carrid ORDER BY fldate )
- LAG( sflight~seatsocc )
OVER( PARTITION BY sflight~carrid ORDER BY fldate ) ) AS seatsfree_previous,
( LEAD( sflight~seatsmax )
OVER( PARTITION BY sflight~carrid ORDER BY fldate )
- LEAD( sflight~seatsocc )
OVER( PARTITION BY sflight~carrid ORDER BY fldate ) ) AS seatsfree_next
FROM spfli
INNER JOIN sflight ON spfli~carrid = sflight~carrid AND spfli~connid = sflight~connid
WHERE spfli~cityfrom = 'FRANKFURT'
ORDER BY sflight~carrid, sflight~fldate
INTO TABLE @DATA(lt_sflights).
This gives the following result in the internal table lt_sflights:
Conclusion
The real value will appear when we select just one row from the database, and are provided with results from three rows. We could select a flight and get information of the previous and next flight with just one access to the database.
References
LAG function: https://help.sap.com/viewer/4fe29514fd584807ac9f2a04f6754767/2.0.03/en-US/e7ef7cc478f14a408e1af27fc1a78624.html
LEAD function: https://help.sap.com/viewer/4fe29514fd584807ac9f2a04f6754767/2.0.03/en-US/5932eebb6208406590071eb65c6caa83.html
Now I am very interested in getting weather information for the various Castles in Transylvania to aid in Monster Making. I have to specify a date and time from the recent past and find the database record that is nearest to that point in time.
The trouble is that the web service that gets the weather information is very sporadic so I cannot guarantee a record exists for a given point in time. But I always need a result. This table has been building for years - it is huge, so I always have at least one record for any given Castle, usually from one hour ago, but the latest one could be one second ago, or two days ago, or two years ago.
Now if I desire to know the weather yesterday at 12:00 midday, and I just say give me a record greater or equal to that timestamp I might get no result. If I said give me the latest record between two days ago and the desired time I might not get a result.
The table is so big and gets bigger every hour that getting every record for a given Castle and looping through them is not a good solution.
So the proposed rule is as follows:-
First find the "last record" for that Castle i.e. the record with the highest timestamp that is less than or equal to the desired time. If there are no records because the very first record was one second ago today then the "last record" will presumably have a time of 01/01/1900.
Then get all the records which have a timestamp greater or equal to the "last record", and use the first one. That way you pretty much always get a result, if the nearest record was a year ago, or a day after the desired time.
One way to do this in ABAP SQL is as follows:-
That works, but the SQL trace reveals the performance is terrible. Moreover the further back in time you go the more records you are transferring from the database to the application server when you only want one, a breach of the "Golden Rules".
There are probably loads of other ways to do this, probably a really obvious one I have not thought of.
Now with the LAG/LEAD functions above they sound like that they might be useful here but the obvious problem is that you need a record to start off with before you can find the next record and the previous one.
So you would have to do the first query anyway ("last record") but once you had that you could get the previous record and the next record, exactly what I want, and then a loop through three records to see which one is closest to the desired time is no problem, or as described above try to bring back all three records at once as one record in a big long structure, and analyze that.
What does the panel think?
Is this a good use case of LEAD/LAG?
Is there another HANA function which is even better suited to this problem?
Is there a really simple way to do this in "normal" ABAP SQL?
Cheersy Cheers
Paul
My view is LEAD/LAG will not help here much as you are merely interested in one nearest record for one castle at any given time and also you still need to select all those records holding the timestamps for the given castle for your WINDOW partition to operate upon based on LEAD/LAG/whatever window function that you choose to use.Unless MIN and MAX are inherently not efficient than a (SELECT on all records relating to the castle + window function operating on them to get the nearest record).
Just on your point of selecting all records and then picking the first one..can't perhaps use MIN function if the first record that you select were to be one with the lowest timestamp?
I hope understood your story correctly..it would be good if you can throw in a few sample records and the record that you arrive at finally based on your selection logic.It will be of help.
I drew up a little spreadsheet of how it works now with the above ABAP SQL query which as you will see is not optimal:-
In scenario 3 I really want row 4 as that is the closest to the desired time. So in that case I am stuffed.
Cheersy Cheers
Paul
So maybe the best ting to do would be two reads - the record with the lowest timestamp after the desired time, and the record with the highest timestamp before the desired time, and then see which of the two results is closest.
Of course in these days of code push down if I could do that on the database level and then only return one row to the application server that would be faster I presume?
Hello Paul Hardy ,
Could it perhaps be done by calculating the absolute difference between 05/05/2020 12:00 and each date/time and then selecting just the one row with the smallest difference.
I’m just thinking out loud with this SQL statement
WITH
+monster_weather AS ( SELECT ABS(utc_timestamp – id_timestamp) AS difference, castle, utc_timestamp WHERE castle = id_castle)
SELECT MIN(difference), castle, utc_timestamp FROM monster_weather;
Regards,
Jan