New Window Functions in ABAP SQL

former_member763514 · ‎11-30-2021

Function that is specified in front of the definition of a window in an ABAP SQL window expression and evaluates the rows of the window, is known as window function. Such a function performs analytic operations over a set of input table rows that are somehow related to the current row and help us to solve complex query challenges in easy ways. Window functions are aimed for writing a single query which shows grouped information AND individual information in a query. You might say, it is also possible to use the GROUP BY clause to show grouped information. But the GROUP BY query has its limitation, in that it won’t show you individual data as well.

The addition OVER designates a function as a window function. If you'd like to narrow the window from the entire dataset to individual groups within the dataset, you can use PARTITION BY to do so. You can use the ORDER BY clause as well in defining a window function. It simply orders by the designated column(s) the same way the ORDER BY clause would, except that it treats every partition as separate. The ORDER and PARTITION define what is referred to as the "window", the ordered subset of data over which calculations are made. Detailed explanation on the windowing syntax and related additions can be found in this blog about Window Expressions in ABAP SQL.

Window functions can be divided in three main groups:

Aggregate functions like AVG( ... ), MAX( sql_exp ), MIN( sql_exp ), SUM( sql_exp ) ..., COUNT(*) that can also be used in aggregate expressions.

Ranking functions like ROW_NUMBER( ), RANK( ), DENSE_RANK( ), and NTILE (n) that can only be used in window expressions.

Value functions LEAD( ... ), LAG( ... ), FIRST_VALUE ( ... ), LAST_VALUE ( ... ) that can be used only in window expressions.

Many of these functions are already known to the ABAP community. From release 7.54 ABAP SQL supports the following new window functions in window expressions as well:

LEAD and LAG

The window functions LEAD or LAG are suitable for calculations, such as determining the difference between values in the current row and values of the preceding or following rows.

These two functions can only be specified together with ORDER BY after OVER with the following syntax

LEAD|LAG( sql_exp1[, diff[, sql_exp2]] )

The result of the functions is the value of the SQL expression sql_exp1 for the row of the current window defined by the addition diff or the box defined by the addition ORDER BY after OVER.

For the function LEAD, diff determines the row positioned the corresponding distance after the current row.

For the function LAG, diff determines the row positioned the corresponding distance in front of the current row.

If diff is not specified, the value 1 is used implicitly. If the row determined by diff is not in the current window, the result is the null value by default. If the optional SQL expression sql_exp2 is specified, it is evaluated and returned for the current row in cases where the row does not exist.

Example

SELECT statement with the window functions LEAD and LAG as operands of an arithmetic expression. The addition PARTITION is not specified, which means there is only one window with all rows of the result set. Both LEAD and LAG have only one argument each, which means that the difference between the values of the column NUM1 is calculated using the directly following or preceding row, and any nonexistent rows produce null values. The latter are defined using a null indicator. The program DEMO_SELECT_OVER_LEAD_LAG_DIFF uses this SELECT statement :

SELECT num1 AS number,

       num1 - LEAD( num1 ) OVER( ORDER BY id ) AS diff_lead,

       num1 - LAG(  num1 ) OVER( ORDER BY id ) AS diff_lag

       FROM demo_expressions

       ORDER BY id

       INTO TABLE @DATA(lead_lag_diffs)

                  INDICATORS NULL STRUCTURE null_ind.

and, when executed, the program displays the result, as partly shown below.

FIRST_VALUE and LAST_VALUE

The FIRST_VALUE function returns the first value of a sorted set of values, the LAST_VALUE function returns the last value of a sorted set of values. OVER and ORDER BY are mandatory. PARTITION BY is optional. If a window is divided into partitions, the FIRST_VALUE/LAST_VALUE function returns a result for each partition (see example). If there's no PARTITON BY clause, the functions work on the entire window.

With the LAST_VALUE function, framing is an important aspect to consider. the LAST_VALUE function always returns the value from the current row. To find the last value for a partition or a window, the correct frame must be specified explicitly using ROWS BETWEEN addition.

Example

The program DEMO_SELECT_FIRST_LAST divides the rows from the DEMO_UPDATE table into three partitions, depending on their value in col1. Within the partitions, the rows are ordered by their value in col3.

SELECT

      id,

      col1,

      col2,

      col3,

      FIRST_VALUE( col2 ) OVER( PARTITION BY col1 ORDER BY col3 )

                  AS first_value,

      LAST_VALUE( col2 ) OVER( PARTITION BY col1 ORDER BY col3 )

                  AS last_value,

      LAST_VALUE( col2 ) OVER( PARTITION BY col1 ORDER BY col3

                  ROWS BETWEEN UNBOUNDED PRECEDING

                  AND UNBOUNDED FOLLOWING )

                  AS last_value_correct

      FROM demo_update

      INTO TABLE @DATA(result).



 cl_demo_output=>display( result ).

The column FIRST_VALUE returns the first value of col2 for each partition.The column LAST_VALUE does not return the last value. As described above, the default frame is from the first row to the current row. If col3 contains duplicate values, the rows are considered equal and the last value from the group of equals is returned. To get the last value of col2 of the partition, the frame size must be specified explicitly, as demonstrated in LAST_VALUE_CORRECT.

In this example, col3 has multiple duplicate values. The key field - here the field ID - is used to sort rows with the same value.

NTILE

This window function divides the rows of a window into n buckets. The goal is to fill all buckets with the same number of rows by following the rule specified after ORDER BY. If the number of rows of the window m cannot be distributed equally between the number of buckets n, the remainder r is distributed in such a way that the first (m MOD n) buckets each contain one element more. The buckets are numbered starting with the start value 1 and the result of the NTILE function is the number of the bucket a particular row belongs to. The syntax is

... NTILE( n ) OVER( [PARTITION BY sql_exp1]
ORDER BY col [ASCENDING|DESCENDING]) ...

n must be host variable, a host expression, or a literal of type b, s,i, or int8 which represents a positive integer. The OVER-clause including ORDER BY is mandatory.

If n is negative, for literals and host constants, a syntax error occurs. If n is a variable or an expression, instead of a syntax error, a database error and its respective exception CX_SY_OPEN_SQL_DB can occur. The result of the NTILE function is always of type INT8.

Since the maximum number of rows in a bucket can vary by 1, rows with the same value can also be in different buckets.

Example

Sorting of all employees listed in table DEMO_EMPLOYEES by their salary and distributes them into five salary groups.

SELECT name,

       salary,

       NTILE( 5 ) OVER( ORDER BY salary ) AS ntile

       FROM demo_employees

       INTO TABLE @DATA(result).

Group 1 has one entry more, as the number of employees (11) cannot be distributed into five groups of equal size.

You can check out a complete list of ABAP window functions in ABAP Keyword Documentation.