A dumb mistake ... again and again ...

lbreddemann · ‎10-22-2007

Not to long ago I was analyzing a performance issue on a customers system.
In the course of this I needed to see some aggregated data, so I opened a transaction where I can put in custom sql and ran it without removing the number of lines to be returned limitation.

The result was astonishing to me in the first place.

The statement looked something similar to this:

SELECT col1, count(*) as CNT
FROM t1
WHERE rownum <2
GROUP BY col1
ORDER BY count(*);

Obviously what I wanted was the col1-Value that appeared at least often in the table.
What I got as a result was this:

COL1   CNT
------- -----
val1     1

Since I knew that this specific value appeared several hundred times in the table this result was wrong. Or better it was the right answer to the wrong question.

The keypoint here is, when the database performs the restriction for the ROWNUM <1 predicate. And that is when it gathers the rows that would fit my other conditions. In my example there where no other conditions - so the database took rows as long as ROWNUM <1 was fullfilled. After it got one row this was the case so that one row was further processed with the GROUP BY and then with the ORDER BY clause.
Basically what I told the database was:
Perform that my query but touch only one row at all. And that the database did.

The Oracle explain plan clearly shows this, too:

create table t1 as select mod(rownum, 3) as col1, 'x' as col2 from dba_objects;
select col1, count(*) from t1 group by col1 order by count(*);

      COL1   COUNT(*)
---------- ----------
         0      15934
         1      15935
         2      15935

Now the query with rownum in place:

      COL1   COUNT(*)
---------- ----------
         1          1

Execution Plan
----------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (ORDER BY)
   2    1     SORT (GROUP BY)
   3    2       COUNT (STOPKEY)
   4    3         TABLE ACCESS (FULL) OF 'T1'

As it's easy to see, the COUNT (STOPKEY) operation is the very first after accessing the table. So before anything else is done the database tries to obey the "touch just one row"-limitation.

In order to get the row 0 (that is the least often occuring value of col1) with the correct count(*) value we've to reformulate the statement like this:

select * from (
            select col1, count(*)
            from t1 group by col1
            order by count(*)
            ) where rownum <2;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   COUNT (STOPKEY)
   2    1     VIEW
   3    2       SORT (ORDER BY STOPKEY)
   4    3         SORT (GROUP BY)
   5    4           TABLE ACCESS (FULL) OF 'T1'

The execution plan now shows that the row-number restriction is applied after all other predicates have been evaluated and thus giving us back just the first row that we wanted:

      COL1   COUNT(*)
---------- ----------
         0      15934

As this mistake happened to me (although I do SQL programming for years now) by accident, it might (and does) happen to others as well. So better double check if your SQL makes sense to you and the database the same way - otherwise the results will be dubious.

By the way: the very same behaviour can be found in MaxDB databases - actually my mistake happened to me on a MaxDB instance. I just took the Oracle explain plan here because it shows that nice "(STOPKEY)" entry to mark up the operation.

Best regards,

Lars

A dumb mistake ... again and again ...

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win