New Window Functions in ABAP SQL
Function that is specified in front of the definition of a window in an ABAP SQL window expression and evaluates the rows of the window, is known as window function. Such a function performs analytic operations over a set of input table rows that are somehow related to the current row and help us to solve complex query challenges in easy ways. Window functions are aimed for writing a single query which shows grouped information AND individual information in a query. You might say, it is also possible to use the GROUP BY clause to show grouped information. But the GROUP BY query has its limitation, in that it won’t show you individual data as well.
The addition OVER designates a function as a window function. If you’d like to narrow the window from the entire dataset to individual groups within the dataset, you can use PARTITION BY to do so. You can use the ORDER BY clause as well in defining a window function. It simply orders by the designated column(s) the same way the ORDER BY clause would, except that it treats every partition as separate. The ORDER and PARTITION define what is referred to as the “window”, the ordered subset of data over which calculations are made. Detailed explanation on the windowing syntax and related additions can be found in this blog about Window Expressions in ABAP SQL.
Window functions can be divided in three main groups:
- Aggregate functions like AVG( … ), MAX( sql_exp ), MIN( sql_exp ), SUM( sql_exp ) …, COUNT(*) that can also be used in aggregate expressions.
- Ranking functions like ROW_NUMBER( ), RANK( ), DENSE_RANK( ), and NTILE (n) that can only be used in window expressions.
- Value functions LEAD( … ), LAG( … ), FIRST_VALUE ( … ), LAST_VALUE ( … ) that can be used only in window expressions.
Many of these functions are already known to the ABAP community. From release 7.54 ABAP SQL supports the following new window functions in window expressions as well:
LEAD and LAG
The window functions LEAD or LAG are suitable for calculations, such as determining the difference between values in the current row and values of the preceding or following rows.
These two functions can only be specified together with ORDER BY after OVER with the following syntax
LEAD|LAG( sql_exp1[, diff[, sql_exp2]] )
The result of the functions is the value of the SQL expression sql_exp1 for the row of the current window defined by the addition diff or the box defined by the addition ORDER BY after OVER.
- For the function LEAD, diff determines the row positioned the corresponding distance after the current row.
- For the function LAG, diff determines the row positioned the corresponding distance in front of the current row.
If diff is not specified, the value 1 is used implicitly. If the row determined by diff is not in the current window, the result is the null value by default. If the optional SQL expression sql_exp2 is specified, it is evaluated and returned for the current row in cases where the row does not exist.
SELECT statement with the window functions LEAD and LAG as operands of an arithmetic expression. The addition PARTITION is not specified, which means there is only one window with all rows of the result set. Both LEAD and LAG have only one argument each, which means that the difference between the values of the column NUM1 is calculated using the directly following or preceding row, and any nonexistent rows produce null values. The latter are defined using a null indicator. The program DEMO_SELECT_OVER_LEAD_LAG_DIFF uses this SELECT statement :
SELECT num1 AS number, num1 - LEAD( num1 ) OVER( ORDER BY id ) AS diff_lead, num1 - LAG( num1 ) OVER( ORDER BY id ) AS diff_lag FROM demo_expressions ORDER BY id INTO TABLE @DATA(lead_lag_diffs) INDICATORS NULL STRUCTURE null_ind.
and, when executed, the program displays the result, as partly shown below.
FIRST_VALUE and LAST_VALUE
The FIRST_VALUE function returns the first value of a sorted set of values, the LAST_VALUE function returns the last value of a sorted set of values. OVER and ORDER BY are mandatory. PARTITION BY is optional. If a window is divided into partitions, the FIRST_VALUE/LAST_VALUE function returns a result for each partition (see example). If there’s no PARTITON BY clause, the functions work on the entire window.
With the LAST_VALUE function, framing is an important aspect to consider. the LAST_VALUE function always returns the value from the current row. To find the last value for a partition or a window, the correct frame must be specified explicitly using ROWS BETWEEN addition.
The program DEMO_SELECT_FIRST_LAST divides the rows from the DEMO_UPDATE table into three partitions, depending on their value in col1. Within the partitions, the rows are ordered by their value in col3.
SELECT id, col1, col2, col3, FIRST_VALUE( col2 ) OVER( PARTITION BY col1 ORDER BY col3 ) AS first_value, LAST_VALUE( col2 ) OVER( PARTITION BY col1 ORDER BY col3 ) AS last_value, LAST_VALUE( col2 ) OVER( PARTITION BY col1 ORDER BY col3 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS last_value_correct FROM demo_update INTO TABLE @DATA(result). cl_demo_output=>display( result ).
The column FIRST_VALUE returns the first value of col2 for each partition.The column LAST_VALUE does not return the last value. As described above, the default frame is from the first row to the current row. If col3 contains duplicate values, the rows are considered equal and the last value from the group of equals is returned. To get the last value of col2 of the partition, the frame size must be specified explicitly, as demonstrated in LAST_VALUE_CORRECT.
In this example, col3 has multiple duplicate values. The key field – here the field ID – is used to sort rows with the same value.
This window function divides the rows of a window into n buckets. The goal is to fill all buckets with the same number of rows by following the rule specified after ORDER BY. If the number of rows of the window m cannot be distributed equally between the number of buckets n, the remainder r is distributed in such a way that the first (m MOD n) buckets each contain one element more. The buckets are numbered starting with the start value 1 and the result of the NTILE function is the number of the bucket a particular row belongs to. The syntax is
… NTILE( n ) OVER( [PARTITION BY sql_exp1]
ORDER BY col [ASCENDING|DESCENDING]) …
n must be host variable, a host expression, or a literal of type b, s,i, or int8 which represents a positive integer. The OVER-clause including ORDER BY is mandatory.
If n is negative, for literals and host constants, a syntax error occurs. If n is a variable or an expression, instead of a syntax error, a database error and its respective exception CX_SY_OPEN_SQL_DB can occur. The result of the NTILE function is always of type INT8.
Since the maximum number of rows in a bucket can vary by 1, rows with the same value can also be in different buckets.
Sorting of all employees listed in table DEMO_EMPLOYEES by their salary and distributes them into five salary groups.
SELECT name, salary, NTILE( 5 ) OVER( ORDER BY salary ) AS ntile FROM demo_employees INTO TABLE @DATA(result).
Group 1 has one entry more, as the number of employees (11) cannot be distributed into five groups of equal size.
You can check out a complete list of ABAP window functions in ABAP Keyword Documentation.
very nice blog and very nice new feature for abap 🙂
Is the sql windows expression available in ABAP CDS on S/4 HANA 2021 too?
Thanks and Regards
Thanks Christian for your interest. as far as i know, SQL functions like LEAD, LAG, NTILE and other are still not supported in CDS views. But maybe this blog could help 🙂
nice blog and very good explanation, many thanks!
Thank you Markus 🙂
Hello Safa, thanks for bringing another very useful blog.
you're welcome 🙂 nice that you follow the blog chain 🙂
Thanks for sharing!
1. What exactly is "window function"? This blog says window function is used in window expression and the linked blog for window expressions just says they use window function. 🙂 I had to google it and found an explanation in this Wikipedia article, which leads me to the next question.
2. Wikipedia article says that window function has OVER clause and aggregate functions (like MAX or SUM) are not window functions. But this blog mentions aggregate functions as a sub-type of window function. Is this accurate?
3. The blog (linked in the comments above) is dated 2017 and it already mentions LEAD, LAG, NTILE. So, which of the functions are actually new? It's totally fine to mention functions that existed before (I'm sure many readers never heard of them anyway), but when the blog title says "new", I feel the availability needs to be noted better.
And I understand this is more of a philosophical question but I still want to ask:
4. Why can't these just be supported by the CDS views? These are SQL functions, why do we have to go through a table function to get this functionality? Aggregate functions work with CDS, why not these?
Thanks for your interest. As it said in the very first sentences here:
Function that is specified in front of the definition of a window in an ABAP SQL window expression and evaluates the rows of the window, is known as window function..... Also the meaning of 'window is explained in the linked blog as well as here in the text: the ordered subset of data over which calculations are made. So I believe it must be clear with the meaning of a window function! Though it doesn't harm to search more also some times would be good to read twice 😀
As it is ALSO said in the text, this aggregation functions can be used in a window expression and in ABAP we consider them as one of the three types of window function.
The functions lead, lag, ntitle and... already existed as a SQL function, what is new is their usage in ABAP SQL which is just possible from release 7.54 & 7.55 (2019-2020). The blog you shared says : 'Currently SQL functions like LEAD, LAG, NTILE and other are not supported in CDS views.' and try to define a way to use those functionalities in other ways...