Non-deterministic behavior of Window function
If you have reached to this blog post, it may be because you have questioned why query sometimes prints different query results time to time when Window function is used in the query execution plan. Today, I am here to help you out to understand why is that so. Before we dive info further, let’s briefly check on the definition of Window function and its characteristics.
OVER ( [ <window_partition_by_clause> ] [ <window_order_by_clause> ] [<window_frame_clause>])
** More details regarding its uses
If you need more details about window function types and how to use them, please refer the link below.
After reading the definition of Window function, now you know that window function has a non-deterministic function and it deviates from a regular aggregation function, but maybe the definition is not enough to give you an intuitive impression on how exactly Window function behaves underneath. Let me explain some details with an example.
create column table myTable (CLASS CHAR(10), AGE INT, GENDER CHAR(10), MEMBER_ID INT);
INSERT INTO myTable VALUES(‘A’, 10, ‘Female’, 12114);
INSERT INTO myTable VALUES(‘A’, 34, ‘Female’, 35583);
INSERT INTO myTable VALUES(‘A’, 51, ‘Male’,55864);
INSERT INTO myTable VALUES(‘A’, 51, ‘Male’,54312);
INSERT INTO myTable VALUES(‘A’, 51, ‘Male’,58512);
INSERT INTO myTable VALUES(‘A’, 14, ‘Female’, 48766);
INSERT INTO myTable VALUES(‘B’, 12, ‘Male’, 32587);
INSERT INTO myTable VALUES(‘B’, 12,’Female’, 96861);
INSERT INTO myTable VALUES(‘B’, 27, ‘Male’,11541);
As you can see from the link above, there are several types of Window functions. Among them, I choose to use ROW_NUMBER() as an example.
Once we have decided which one to use, then let’s create a select statement using Window function syntax. We partitioned the table into firstly with CLASS and secondly with GENDER. In query, it can be expressed as
SELECT CLASS, GENDER, AGE, MEMBER_ID,
ROW_NUMBER() OVER (PARTITION BY CLASS, GENDER ORDER BY AGE) ROW_NUM
“ROW_NUMBER OVER (PARITION BY CLASS, GENDER)”
FIGURE 1. FIRST PARTITIONING BY CLASS (Notice the red boxes)
FIGURE 2. SECOND PARTITIONING BY GENDER (Notice the blue boxes)
And then, order them by age
“ROW_NUMBER OVER (PARTITION BY CLASS, GENDER ORDER BY AGE)”
FIGURE 3. ORDER BY AGE (Marked with black arrows)
If you look at the second blue rectangle from the top, these three members (55864, 54312, and 58512) have the same class and gender values. In this case, even if you sort them by their age, they could not be distinguished from one another because they have been partitioned only by CLASS and AGE, but not by any unique value.
If you face this situation with either small amount of data or no materialization (materialization happens switching among different engines), the query result could manage to stay the same no matter how often you execute the query. Yet, if your situation doesn’t belong to those cases mentioned above, please be informed that you may experience non-deterministic query result.
- Order each partition with unique value
- If your business need allows, try to partition the table where it doesn’t have common query result
Before you design your business scenario using Window functions, please be aware of its characteristics and limitations