[Oracle] DB Optimizer Part II – Extending execution plans or how-to find the bad one
Introduction
In the first part of this series we have investigated the IN-LISTs and how the Oracle optimizer has estimated the correct amount of rows (cardinality) or not. Unfortunately the SQLs are not that easy in real world. So our basic algebra exercise (especially if we don’t know the data that well) will not cover most of the real world performance issues. There are also other optimizations (like query rewrite, pushing predicates, eliminations, etc.) which makes it even harder to calculate the real amount of processed data for each execution plan step.
In conclusion we need some advanced “tools” (or better said more information in the basic tools) to verify and find where the optimizer is way off by calculation the cardinality. Just think about joining several tables, using indexes, views or especially BW generated queries (sometimes these queries look like from outer space 😈 ).
This blog tries to demonstrate one possibility to extend the execution plan with run time information, so we can identify the CBO miscalculation and provide a solution afterwards.
Extending the execution plan
Basically we can classify it into three main possibilities:
1. SQL Trace
This is the oldest and well known way. You can define the level of information (binds, waits, etc.) in detail by enabling a SQL trace on database level and analyze them afterwards. Usually the event 10046 or the PL/SQL packages DBMS_SYSTEM, DBMS_SUPPORT or DBMS_MONITOR are used for enabling and disabling a SQL trace. The trace file can become very large (depending on the tracing level) with the overhead of tracing all that information.
2. SQL execution statistics
If you have identified your slow SQL, you can rerun it with the init parameter statistics_level set to ALL or with the gather_plan_statistics hint. If you run the SQL with one of these settings you will get additional run time information like actual rows, logical and physical I/O for each execution step. Unfortunately this information is only published after the SQL is executed completely. So you are still in trouble with never ending or aborting SQLs (very common in SAP BW systems).
3. Real-Time SQL Monitoring
This feature was introduced in Oracle 11g and closes the gap, that you still have with the SQL execution statistics. You need the diagnostic and tuning pack for this feature, which is already included, if you purchase your oracle license by SAP. However with the help of this feature you can monitor or better said get the execution statistics of “long” running queries in real time. This feature is very useful in SAP BW environments, if temporary BW tables are included and we are not able to reconstruct the SQL manually afterwards. Using the web interface (Oracle database control) is a very comfortable way for this features.
This blog will focus on point 2 “SQL execution statistics”, because of i mostly use this approach for trouble shooting SQL execution plans in a SAP environment (especially in OLTP systems). I usually trace the SQL bind variables by transaction ST05 and rebuild that particular SQL in SQL*Plus afterwards. This is the easiest way (for me) to identify and fix the problem in most cases.
The following example will focus on the cardinality estimation (= amount of returned rows) only, because of the execution statistics provide a lot of more information, which would overcharge this blog. If you are interested into any other metrics of the “extended” execution statistics, please let me know and i can focus on that in one of my next blogs.
The example
Let’s start with the previous example (but with more rows) once again.The demos are run on an Oracle 11.2.0.3 database.
SQL> create table ZTEST as select trunc(dbms_random.value(1,6)) col
from dba_objects where rownum <=1000;
SQL> exec dbms_stats.gather_table_stats(NULL,'ZTEST');
SQL> var val1 number;
SQL> var val2 number;
SQL> var val3 number;
SQL> exec :val1 := 1;
SQL> exec :val2 := 1;
SQL> exec :val3 := 1;
SQL> alter session set "_optim_peek_user_binds"=FALSE;
SQL> select col from ZTEST where col in(:val1,:val2,:val3);
SQL> select * from table (DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALL ALLSTATS OUTLINE PEEKED_BINDS LAST'));
In this simple example we know that the estimation of rows (E-Rows) is way off, because of we only have round about 200 rows for column value 1. Please read my first blog of this series first, if you don’t know how we come to this conclusion.
But we want a proof of our calculation in the execution plan too. So we run this SQL with extended SQL execution statistics now. I will use the gather_plan_statistics hint, but you can also set the init parameter parameter statistics_level to ALL and run the SQL.
SQL> select /*+ gather_plan_statistics */ col from ZTEST where col in(:val1,:val2,:val3);
SQL> select * from table (DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALL ALLSTATS OUTLINE PEEKED_BINDS LAST'));
Do you see all that new columns like A-Rows, A-Time, Buffers and Reads in the execution plan?
The amount of additional information depends on the kind of query (and executed work). But let’s focus on our simple example here …
A-Rows is the amount of rows, that is returned by this execution plan step in real world. As you can see the “real amount of processed data” is round about 200 rows like we have calculated before. So the optimizer is way off here and we should investigate this part of the execution plan, if the performance problem is caused by that incorrect estimation (which can lead to a wrong join method or join order for example).
A-Time is the time, that is needed for the execution of this plan step in real world.
Buffers is the amount of logical I/O, that is performed by this execution plan step in real world.
Reads is the amount of physical I/O, that is performed by this execution plan step in real world.
Side note: Depending on the query and execution plan there can be incorrect real time values which can be fixed with the hidden parameter “_rowsource_statistics_sampfreq” (frequency of rowsource statistic sampling (must be a power of 2)).
This is just a very tiny introduction into analyzing an execution plan and the CBO estimations. There is so much more to demonstrate, investigate and explore, but this should be enough for this blog right now. I appreciate your input for further blog topics.
Summary
Oracle provides a lot of possibilities to take a deeper look into the execution plan and its real world execution by using DBMS_XPLAN only. This approach is mostly enough to identify the root cause for slow running SQLs. Finding the reason for wrong calculations (and wrong execution plans in consequence) is much more complex and difficult in most cases of course, but if you have already found the reason for the slow SQL you are in a much better position.
If you have any further questions – please feel free to ask or if you need assistance by solving performance problems get in contact directly.
Hello Stefan,
how do these A-values differ from the statistics at the end of the output
from a statement run from sqlplus with
SET AUTOTRACE ON
I would assume that reads and gets listed there would correspond to these A-values.
I see the benefit that this data is stored in a more comfortable way for later re-use,
than just having it as a text in sqlplus output, but for me the use of AUTOTRACE
was sufficiant up to now.
Volker
Hi Volker,
well in that extremely simple case the statistics from the AUTOTRACE would be the same (no difference), but AUTOTRACE shows the sum of all activities only.
If you have a more complex execution plan with index access or joins (like nested loop or hash joins) you have no chance to drill it down to each execution plan row source with AUTOTRACE. Just think about a nested loop join and the driving table for example.
But hey .. that sounds like a pretty nice topic for the next blog 🙂
Regards
Stefan
Ah ok, I see.
Yes that might be indeed valuable information.
I'm going to keep this in mind for my next analysis.
Thanks a lot
Volker