Skip to Content

Introduction

In the second part part of this series Volker mentioned, that the approach of extending execution plans provides the same statistics as “SET AUTOTRACE ON”. He was right with my previous simple example of course, but unfortunately SQLs are not that easy in the real world. In this blog we will investigate a simple table join and how the approach of extending execution plans can help us to understand the amount of work for each execution plan step.

The test data and environment

The demos are run on an Oracle 11.2.0.3 database with specific environment settings. The special environment settings are needed to get a simple execution plan with a NESTED LOOP join. Usually Oracle chooses a different execution plan for this particular SQL, but we want to focus on the methodology here. I also use literals only to simplify the example.

The tables

Bildschirmfoto 2012-08-09 um 10.28.15.png Bildschirmfoto 2012-08-09 um 10.28.33.png

There is an index on the column SLOC for table PRLOC and an index on the columns ORDERNR and PRDNR for table PREORDER. The column PRDNR is unique and for each value of column PRDNR in table PRLOC is a corresponding value in table PREORDER.

The optimizer and database settings

SQL> alter session set "_optimizer_sortmerge_join_enabled"=FALSE;
SQL> alter session set "_hash_join_enabled"=FALSE;
SQL> opt_param('_nlj_batching_enabled', 0)

The SQL

SQL> select * from PRLOC, PREORDER
           where PRLOC.PRDNR = PREORDER.PRDNR and PRLOC.SLOC = 30
           and PREORDER.ORDERNR between 1 and 1000;

Let’s understand the SQL first, before we start with the execution plan itself. Basically we want to know, which products are stored in location 30 for the order numbers between 1 and 1000. So our filters are the location on table PRLOC and the order number on table PREORDER.

Analysis

The “AUTOTRACE ON” approach

SQL> set autotrace on
SQL> select * from PRLOC, PREORDER
            where PRLOC.PRDNR = PREORDER.PRDNR and PRLOC.SLOC = 30
            and PREORDER.ORDERNR between 1 and 1000;

Bildschirmfoto 2012-08-09 um 10.06.45.png

After the SQL is executed successfully we get the overall statistics like 22.091 logical I/Os, 33 physical I/Os and 10 rows processed. But we don’t know the initiator. Which step has caused the 22.091 logical I/Os? Was it caused by the INDEX RANGE SCAN or the FULL TABLE SCAN? What do we need to focus on to get the SQL faster or doing less work? We get no answers to these questions with “AUTOTRACE ON”.

Extending the execution plan

SQL> select /*+ gather_plan_statistics */ * from PRLOC, PREORDER
            where PRLOC.PRDNR = PREORDER.PRDNR and PRLOC.SLOC = 30
            and PREORDER.ORDERNR between 1 and 1000;
SQL> select * from table (DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALL ALLSTATS OUTLINE PEEKED_BINDS LAST'));

Bildschirmfoto 2012-08-09 um 10.08.29.png

We can drill down the logical or physical I/Os to each step now. The amount of logical and physical I/Os differs from the AUTOTRACE output, but let’s ignore this right now and focus on the values itself. The optimizer does a pretty good job by calculating the amount of rows for table PREORDER (or its index right here), but the calculation for table PRLOC is way off (1 * 1000 to 10). This execution plan step is responsible for 22001 logical I/Os (which is nearly 100% of the work) as you can see. We need to reduce the amount of work for this particular step, if we want to tune this SQL.

We don’t know anything about the data in the tables until now, so let’s take a closer look into table PRLOC.Bildschirmfoto 2012-08-09 um 10.10.42.png

Hmm .. pretty interesting. The amount of data for the location is totally undistributed (by the way this is a pretty common case by using several data warehouse places in production for example). So let’s check what the optimizer calculates, if we query the table PRLOC with our filter predicate “SLOC = 30” only.

SQL> select /*+ gather_plan_statistics */ count(*) from PRLOC where SLOC = 30;
SQL> select * from table (DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALL ALLSTATS OUTLINE PEEKED_BINDS LAST'));

Bildschirmfoto 2012-08-09 um 10.12.12.png

Hmm .. what’s this? There are only 10 rows returned by our query, but the optimizer has calculated 3.333 rows. Could this be the reason for the “bad” execution plan by joining both tables? Maybe, but what is the reason for the wrong estimation?

The optimizer assumes that the data is distributed equally by gathering “basic” table statistics (default in a SAP environment). That means in our example:

  • We have 3 values (10, 20, 30) in column SLOC by 10.000 rows in sum.
  • If we apply a filter with SLOC = 30, the optimizer assumes that only 1/3 or 33 % of the data will be returned
  • Calcuation 10000 * 0.3333 = 3333 rows

That is exactly the same number as estimated by the optimizer, but what can we do in this situation? The optimizer is only as clever as the provided data, so we need to provide more information called “Histograms” in our case here. If we provide the optimizer more information about the data distribution with the help of histograms, it is able to calculate better results. So let’s gather statistics with histograms on table PRLOC.

SQL> exec dbms_stats.gather_table_stats(NULL,'PRLOC');

Check the calculation afterwards:

SQL> select /*+ gather_plan_statistics */ count(*) from PRLOC where SLOC = 30;
SQL> select * from table (DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALL ALLSTATS OUTLINE PEEKED_BINDS LAST'));

Bildschirmfoto 2012-08-09 um 10.13.30.png

We got it. The optimizer calculates the right amount of data now and chooses a different execution plan for our extracted query. Could this be the solution for our NESTED LOOP JOIN too? Let’s check ….

SQL> select /*+ gather_plan_statistics */ * from PRLOC, PREORDER
            where PRLOC.PRDNR = PREORDER.PRDNR and PRLOC.SLOC = 30
            and PREORDER.ORDERNR between 1 and 1000;
SQL> select * from table (DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALL ALLSTATS OUTLINE PEEKED_BINDS LAST'));

Bildschirmfoto 2012-08-09 um 10.14.48.png

Wow .. as you can see the FULL TABLE SCAN on table PRLOC has gone and the join order has changed. Based on these changes the amount of logical I/Os dropped from 22.006 to 34 and the physical I/Os from 25 to 7. This means round about 172 MB less work (mostly for cached data of course, but this can use up your CPU) for each SQL execution without adding an additional index or reorganizing tables or indexes.

Summary

The approach of extending the execution plan by runtime statistics allows you to identify “troublemaker” in more complex queries. Troubleshooting of SQL execution plans needs know-how of the optimizer, its limitations and runtime behavior of course, but you need to understand the SQL, its data and the runtime work first. I have not explained why the optimizer chooses a different join order after the correct estimation of returned rows on table PRLOC, because of this blog post should focus on how-to identify the workload by extending execution plans and not on optimizer calculations / internals. Basically the optimizer puts the table PRLOC first, because of it is cheaper to query PREORDER only 10 times than query PRLOC 1000 times with pre-filtered values.

Slogan “Know your data and understand the problem first.”

If you have any further questions – please feel free to ask or if you need assistance by solving performance problems get in contact directly.

References

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply