Application Development Blog Posts
Learn and share on deeper, cross technology development topics such as integration and connectivity, automation, cloud extensibility, developing at scale, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 
former_member192616
Active Contributor

Explaining the explain plan for DB6 - Basics Part 2

This blog is a co-operation with my colleague Jan Kritter from the DB6 performance team. This time we will explain the difference between an explain plan with parameter markers and values. Furthermore an outlook on the test execution will be shown.

You may want to read these two overview blogs first:

General basics:

http://wiki.sdn.sap.com/wiki/display/ABAP/Explain+Plan

DB6 basics:

Explaining the explain plan for DB6 – Basics Part 1

The default - using parameter markers

In recent versions of DB6 (>= 8.2.2) the so called data distribution statistics are created as a default. In earlier version the distribution statistics  were only created if explicitly specified, as it is still the case for histograms in recent ORACLE versions.

Whether the data distribution statistics for a table are in place or not can be checked with this SQL statement (to be executed in ST04 - Diagnostics - SQL Command Line or from the DB2 command line):

>select * from syscat.coldist where tabname = 'DD02L'

If you get a result set, you have data distribution statistics.

With data distribution statistics the optimizer is aware of the data distribution.

Let's take the table DD02L as an example. The DD02L table stores all SAP tables. We have a field called AS4LOCAL which holds the activation status for each table. In my system the distribution is as follows (for this result I create a table in SE11 but did not activate it):

>select as4local, count(*) from dd02l group by as4local

AS4LOCAL       Count

-------- -----------

A             145202

N                  1

Only 1 entry has status ‘N' (entry edited, but not activated) all other are ‘A' (entry activated or generated in this form). With data distribution statistics the DB6 optimizer is aware of this data skew.

Now let's run this little ABAP program.

tables: dd02l.
select * from dd02l where as4local = 'N'. endselect.

We trace and analyze the execution with ST05.

Figure 1 - Explain plan with markers

In the execution plan we can see that the estimated cost is 5967 (bullet 4 in figure 1) and the estimated rows is 72601 (bullet 5 in figure 1). Furthermore a full table scan (TBSCAN) is executed to run this query.

In the statement text you can see that the where condition is:

AS4LOCAL = ?

This is the parameter marker that is used by the optimizer to come up with the execution plan (access plan). Although the data distribution is known the optimizer expects 72601 rows as a result set since it is NOT aware WHICH of the two existing values (see bullet 1 in figure 1) A or N are used for the execution because the parameter marker ("?") is used to build a plan as is typically the case for dynamic statements.

According to the statistics we have 145203 rows in table DD02L (bullet 2  in figure 1). The card (cardinality is 145203. So the number of expected rows is card / distinct keys of the column in this case: 145203 / 2 = 72601,5 (72601 as indicated in the access plan).

The option - using values

Now let's click on the "Values" button (bullet 3 in figure 1) to see the estimated plan if we give the additional information regarding the data distribution to the optimizer.

Figure 2 - Explain plan with values

As you can see the cost goes down to 642 (bullet 8 in figure 2) and the number of estimated rows is 1 (9.9999E-01) (bullet 9 in figure 2). The button "Values" (bullet 3 in figure 1) changes to "Marker" (bullet 6 in figure 2). The WHERE condition is now

AS4LOCAL = ‘N'.

Now the optimizer has all the information needed to come up with a better plan.

Instead of a full table scan (TBSCAN) we get an index range scan (IXSCAN with 0 columns). Although we don't have an index that starts with the column AS4LOCAL the optimizer decides to do use index DD02L~1 which has AS4LOCAL in the second position. This is done because the index has less pages (nleaf 88, see bullet 7 in figure 2) than the table (npages = 1540; see bullet A in figure 1). This means less I/O with the IXSCAN.

Test execution on the database

Now let's compare the effects. In order to let the database know the values we have to use the "SUBSTITUTE LITERALS" hint.

In order to compare them we change our program to:

tables: dd02l.

select * from dd02l where as4local = 'N'. endselect.
select * from dd02l where as4local = 'N'
  %_hints db6 '&SUBSTITUTE LITERALS&'. endselect.

In specific circumstances the "REOPT ONCE" hint would be an alternative. This option and others will be discussed in a separate blog. For now, please note that the "REOPT ONCE" should only be used in exceptional cases, also to be explained in an upcoming, separate blog.

As always with hints, only use them if there is no other chance of tuning the SQL statement and verify well that they do not have undesired side effects.

We execute it before we trace it (this is done to fill all the caches). The second execution is then traced in an SQL trace:

Figure 3 - Test Execution

The first execution takes 47 milliseconds (using a TBSCAN) when the data is cached.

The second execution takes 8 milliseconds (using an IXSCAN) when the data is cached.

With the test execution option (see bullet B in figure 3), which is accessible in the explain plan screen (figure 1 and figure 2), we can get more details from the database. Note that after clicking on the test execution button you have to click once more Execute in the top left corner to see the details of the test execution.

In this blog we don't want to go into the details of the test execution, this will be explained in more detail in another blog. However we can see that in fact less I/O is done. (In the above screenshot this is logical I/O - given that we execute the statement with all the data already in the buffer pool no physical I/O occurs - but if we had a perfectly cold buffer pool - none of the data we need is already present in the buffer pool - the argument would carry even more weight as physical I/O is even slower than logical I/O.) Just compare these figures: Total User CPU time, Rows read, buffer pool data logical reads vs. buffer pool index logical reads + buffer pool index logical reads.

Summary

With the "Value" button in DB6 you can quickly check if a better execution plan is used if the optimizer had more information as values instead of parameter markers often reveal that the optimizer decides on a suboptimal access plan because of data skew. The results that are achieved are sometimes much better if we provide this additional information to the optimizer.

Jan Kritter is Technical Support Consultant for SAP AGS (SAP Active Global Support)