[Oracle] SQL Execution Engine Part I - Join data b...

stefan_koehler · ‎01-26-2013

Introduction

We already took a (tiny) closer look at the cost based optimizer in my first series called "DB Optimizer". So i decided to start another tiny series about the SQL execution engine, which executes the calculated and generated execution plan. In the first part of this series i would like to cover a common ABAP coding "mistake", that i notice on regular basis and how it effects the execution / processing by using hash join(s) for two or more tables.

The Hash Join

Let's check what hash joins are about before we start to take a look at the execution of hash joins. The following explanation is extracted from the official oracle documentation:

Hash Join

A join in which the database uses the smaller of two tables or data sources to build a hash table in memory. In hash joins, the database scans the larger table, probing the hash table for the addresses of the matching rows in the smaller table.

The Query Optimizer - Hash Joins

The database uses hash joins to join large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.This method is best when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.

The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:

- A large amount of data must be joined.

- A large fraction of a small table must be joined.

Let's omit the topic "Why does the optimizer choose a Hash Join and not a Nested Loop (for example)" as we don't want to talk about the query optimizer this time. The documentation states an important point about how a hash join works, but misses some as well:

The smaller of the two tables (= data sources) is used to build a hash table in memory (PGA)
The hash table contains all rows (that means all select columns as well) from the "smaller table" (which is calculated by object statistics and filter expressions) organized by a hash value of the join column(s)
The larger table is probed against the hash table based on the hash value of the join column(s)

Basically said (without going too much in detail about the PGA and its work areas - check the references if you are interested into the internals about that) the amount of "used memory" for a hash join is mainly driven by the amount of data (join column + additional selected columns) that is extracted from the smaller table (there is some overhead for the second table as well, but let's disregard this at this point). We are talking about an optimal execution, if all of the data can be handled in memory and we don't need to swap it to the temporary tablespace. It is called "one pass execution", if we need to dump our hashed and probed data only once to the temporary tablespace. The last and worst case scenario is called "multi pass execution" in which we need to dump an re-read the data many times to/from the temporary tablespace.

The common ABAP coding mistake

ABAP development usually starts with tiny data examples and i see a lot of things like "SELECT * FROM <TAB1> <TAB2> ...." in custom coding even if not all columns are needed in further processing. The data is loaded into an internal table from the database and processed afterwards. In most ABAP cases this is not an issue, because of the joined data set is pretty small, but the amount of work that needs to be performed on the database level can be huge. Maybe this kind of coding is based on comfort or the data processing is not fully known at this point (something like "maybe i need this data later on"). However if the oracle database has already chosen a hash join for that small data examples it can be pretty fast, because of all the data fits into PGA memory or maybe a completely different execution plan is chosen for that small data set. But what happens if the the amount of data is getting larger or the query is executed by many users in parallel and/or the (PGA) memory is not sufficient anymore?

The (query) performance can drop immediately and most of you know the statement: "It is so slow, but we did not change anything" :wink:

So let's take a look at the impact of such "insufficient" written queries in case of hash join(s).

The demo

The following demos were run on an Oracle 11.2.0.3 database on OEL 6.2.

I will create two sample tables called TAB1 and TAB2, which are a copy from dba_objects and contain 74.553 rows each. These tables will be used for the hash join demo, but there will be no "smaller" data set in this example, because of no filter condition is applied. So Oracle will choose the "smaller hash table" based on the amount of selected columns.

I will run various SELECTs with the same amount of returned rows by each test query, but with different amount of columns. Please check my previous blogs for more details about the gather_plan_statistics hint, if you are not familiar with it. The execution plan statistics got a few more columns now, which i have not mentioned in my previous blog posts. So let's start with them first.

0Mem

Estimated memory (PGA) size for optimal execution (in memory)

1Mem

Estimated memory (PGA) size for one-pass execution (dump and read data once to/from temporary tablespace)

Used-Mem

Actual memory used by work area during last execution

Creating two test data sources



SQL> create table TAB1 as select * from dba_objects;

SQL> create table TAB2 as select * from dba_objects;

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(NULL,'TAB1');

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(NULL,'TAB2');

"SELECT *" example (SQL ID 3dqxu675n4p2k)



SQL> select /*+ gather_plan_statistics HASH-TEST */ *

           from TAB1 T_00 join TAB2 T_01

           on T_00.OBJECT_NAME = T_01.OBJECT_NAME;

Table TAB1 is used as data source for the memory hash table (in this case it makes no difference which table is the source, because of we select all columns from both tables and join on column OBJECT_NAME).
The memory hash table is organized by the join column OBJECT_NAME and contains all columns of table TAB1.
Rows from table TAB2 are incrementally loaded into memory and probed on the hash value of column OBJECT_NAME (after the complete hash table of TAB1 is built up).
13 MB memory of PGA memory is used for that operation.

"SELECT TAB1.<COL>, TAB1.<COL>, TAB1.<COL>, TAB2.<COL>" example (SQL ID 61y4ny5c9w034)



SQL> select /*+ gather_plan_statistics HASH-TEST-COL */ T_00.OWNER, T_00.OBJECT_NAME, T_00.CREATED, T_01.STATUS

           from TAB1 T_00 join TAB2 T_01

           on T_00.OBJECT_NAME = T_01.OBJECT_NAME;

Table TAB2 is used as data source for the memory hash table. This approach is reasonable, because of it needs much less memory for only two columns (join column OBJECT_NAME and STATUS) instead of four columns (join column OBJECT_NAME and OWNER, OBJECT_NAME, CREATED).
The memory hash table is organized by the join column OBJECT_NAME and contains column STATUS of table TAB2.
Rows from table TAB1 are incrementally loaded into memory and probed on the hash value of column OBJECT_NAME (after the complete hash table of TAB2 is built up).
7.6 MB memory of PGA memory is used for that operation.

"SELECT TAB1.<COL>, TAB2.*" example (SQL ID 8tdbtj560y5j0)



SQL> select /*+ gather_plan_statistics HASH-TEST-COL-TAB1 */ T_00.OWNER, T_00.OBJECT_NAME, T_00.CREATED, T_01.*

           from TAB1 T_00 join TAB2 T_01

           on T_00.OBJECT_NAME = T_01.OBJECT_NAME;

Table TAB1 is used as data source for the memory hash table. This approach is reasonable, because of it needs much less memory for only three columns (join column OBJECT_NAME and OWNER and CREATED) instead of all columns of table TAB2.
The memory hash table is organized by the join column OBJECT_NAME and contains the columns OWNER and CREATED of table TAB1.
Rows from table TAB2 are incrementally loaded into memory and probed on the hash value of column OBJECT_NAME (after the complete hash table of TAB1 is built up).
7.6 MB memory of PGA memory is used for that operation.

"SELECT TAB1.<COL>, TAB2.<COL>" example (SQL ID 2xpkj8w6h5djd)



SQL> select /*+ gather_plan_statistics HASH-TEST-2COL-TAB1 */ T_00.OWNER, T_01.OBJECT_NAME

           from TAB1 T_00 join TAB2 T_01

           on T_00.OBJECT_NAME = T_01.OBJECT_NAME;

Table TAB2 is used as data source for the memory hash table. This approach is reasonable, because of it needs much less memory for only one column (join column OBJECT_NAME) instead of two columns (join column OBJECT_NAME and OWNER).
The memory hash table is organized by the join column OBJECT_NAME and contains no additional columns.
Rows from table TAB1 are incrementally loaded into memory and probed on the hash value of column OBJECT_NAME (after the complete hash table of TAB2 is built up).
6.6 MB memory of PGA memory is used for that operation.



SQL> select SQL_ID, CHILD_NUMBER, round(LAST_MEMORY_USED/1024/1024,2) LAST_MEMORY_USED_MB, round(LAST_TEMPSEG_SIZE/1024/1024,2) LAST_TEMPSEG_SIZE_MB,

                       round(MAX_TEMPSEG_SIZE/1024/1024,2) MAX_TEMPSEG_SIZE_MB,  MULTIPASSES_EXECUTIONS, ONEPASS_EXECUTIONS, OPTIMAL_EXECUTIONS, TOTAL_EXECUTIONS, OPERATION_TYPE, OPERATION_ID

           from V$SQL_WORKAREA

           order by LAST_MEMORY_USED desc, TOTAL_EXECUTIONS desc;

This query output is just another way to check the work area usage for different SQL statements (based on the cached SQLs in the shared pool).

Summary

As you can see the memory consumption for hash joins is mainly driven by the amount data that needs to be stored in the hash table. In our example the worst case for all columns was round about 13 MB and the best case (for the join column only) was round about 6.6. MB of PGA memory.

Now you maybe think .. man that's not much memory - why should we consider this at all. But you have to see this behavior in another context:

This was just a demo with a small table of 15 columns and 74.553 rows. Not all columns contain data like the tables in the SAP schema (with default value and not nullable).
I was the only user running this query on this tiny amount of data - just scale this up for a large (data) environment with multiple sessions running this query.
CPU resource is needed for creating and processing the "hash table"
You assign "a lot of" memory to the PGA (usually 20% for OLTP and 40% for OLAP systems), but a single work area is not able to use all of this memory (for the details check the PGA Memory Management Internals Document)

I hope you get the key points here and why it is important to select only the needed data from the database in case of a hash join.

"SELECT only the needed data and use the database engine as effective as possible."

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database (performance) issues.