SELECT..INTO TABLE faster than SELECT/ENDSELECT. The real reason.
Let me begin by asking a simple question. Say I have a database table – dbtab that contains 1000 rows of data. Say I run a SELECT/ENDSELECT query on dbtab. Now, does this statement hit the database a 1000 times?
Most beginners (and sometimes even ABAP “veterans”) would emphatically reply – YES. The purpose of this blog is to examine if this is really true.
Consider the following code snippet:
DATA: WA_T005 TYPE T005,
IT_T005 TYPE STANDARD TABLE OF T005.
APPEND WA_T005 TO IT_T005.
Now, go to ST05 (Performance Trace) transaction and activate the trace. Execute the above code snippet and then view the SQL trace of ST05. The following is a screenshot of “Detailed Trace List” view of the SQL Trace.
Fig 1: Detailed Trace List.
Before going any further, let me explain briefly about ST05’s SQL trace. By means of SQL trace, one can analyze all database statements that were sent from the Database to the DBI (Database Interface). In other words, every time the system hits the database and sends data to the DBI (which is a component of the work process. Therefore DBI is a part of the Application Server), it is logged in the SQL trace of ST05. So SQL trace is a crude way of knowing roughly – “How many times has the system gone from the DBI and hit the database?”.
So if the SELECT/ENDSELECT would have hit the database a 1000 times, we should have seen a 1000 entries with the FETCH operation in the above trace. Instead we see only 1 FETCH in the trace and all the 236 records present in the T005 table are pulled, in a single step in the FETCH operation!!
What actually happens under the hood is – the DBI automatically implements an optimization. The data is NOT sent from the database to the DBI, one row at a time. The data is sent from the database to the Application Server (specifically to the DBI) in “Packages”. The data in the packages is buffered in the DBI and then sent to the ABAP program. In case of a SELECT/ENDSELECT, the data is sent from the DBI to the ABAP program one row at a time. Why one row at a time? Because the ABAP program can store data in only wa_t005 (which is a work area; it can hold only a single line at a time). So it is important to realize that there are 2 steps involved – transfer of data from the database to the DBI and the transfer of data from the DBI to the appropriate variable in the ABAP program. The transfer from database to DBI ALWAYS happens in packages (unless a SELECT..UP TO 1 ROWS or SELECT SINGLE is used).
Consider Case 2: If I had used the following code:
INTO TABLE IT_T005.
In this case also, the data is sent from the database in “packages” and buffered in the DBI. And the data is sent from the DBI to the ABAP program only after ALL the packages have been received. At this point, I have to back up for a moment and explain what exactly a package is.
A package is a “packet of data”, that is, a set of rows. The package size depends on the respective database platform and on whether it is a Unicode system or not. Usually, package sizes are between 8KB and 128KB and are a multiple of 8KB. So suppose, I have a database table with 100,000 records and its line width is 64 bytes. And my SELECT query is to fetch 2500 records from it. That means the data to be fetched will occupy a space of 160,000 Bytes or 160KB (2500 multiplied by 64). Assume the package size on my database platform is 32KB. So the entire data of 160KB will be transferred in 5 packets (160/32) from the database to the application server. So if the total size of records to be fetched is more than the size of a single package, the data is transferred in the form of multiple discreet packages. Now would be a good time to look back at Fig 1 where you would see columns named – “Array” and “Recs”. “Array” column represents the maximum number of records/rows that a single package may contain. “Recs” represents of records/rows transferred in that particular FETCH operation. So from Fig 1, it can be seen that the each package can accommodate 32767 rows. The total number of records in T005 is 236 and since my SELECT query has no WHERE condition, 236 records are to be fetched. The package can take in more than 236. Therefore all the data to be fetched in this query is fetched in a single package. So there is just 1 hit to the database.
Now coming back to case 2, the DBI waits until it receives all the data to be fetched in that query. Say there are 50,000 countries present in T005 (let’s hope the world doesn’t become so fragmented!), the DBI would wait for 2 packages to arrive from the database. And then, it would send all this data to the ABAP program (i.e. to the internal table IT_T005). In this case, the data is NOT sent 1 row at a time, from the DBI to the ABAP Program and this is because the query now has a variable that can hold multiple rows (i.e. the query has the internal table IT_T005). So in Case 2, optimization is implemented on 2 levels – at the database level, which transfers data, not in individual rows but in packets and at the ABAP program level, where data is transferred from the DBI to the program, not in single rows but in bulk.
In fact, this is the actual reason why SELECT..INTO TABLE.. is faster than SELECT/ENDSELECT. Very often I hear the answer – “SELECT/ENDSELECT is slower because the system hits the database many times. On the other hand, in the case of SELECT..INTO TABLE.. , the system hits the database only once”. That is not correct. In both statements, the no. of database hits would be the same (for Case 2, you may look into the SQL trace. The no. of DB hits would be 1 ). The correct answer is – in case of SELECT..INTO TABLE, there is an optimization at 2 levels. On the other hand, in SELECT/ENDSELECT the optimization happens at only one level.
INTO TABLE IT_T005_TEMP
PACKAGE SIZE 10 .
In this case, the difference would be – the DBI will not wait until it receives all the packages from the database. As soon as it gets a package from the DB, it buffers it and immediately sends it to the internal table variable IT_T005_TEMP in the ABAP program. Once it gets the next package from the database, it sends the next set of data to the internal table IT_T005_TEMP. This will over write the data previously present in IT_T005_TEMP.
- An analogy may be drawn to Max Planck’s Quantum Theory (which states that Electromagnetic Radiation can be emitted or absorbed in discrete packets/bundles of energy called “quanta”). Similarly, data is fetched from the database to the Application Server in discreet packets/bundles called “Package” (unless we use SELECT..UP TO 1 ROWS or SELECT SINGLE).
- When a SELECT/ENDSELECT query is run on a table with 1000 records, it does NOT mean that the statement hits the database 1000 times. The number of database hits depends on the package size, the line width of the table and the number of rows to be fetched.
- The reason why SELECT..INTO TABLE is faster than SELECT/ENDSELECT is because, there is an optimization on 2 levels (ABAP program interface and in database) rather than in just 1 level (database level).
 Gahm, Hermann. ABAP Performance Tuning. SAP-Press. 1st Edition. 2009.
NOTE: This blog may not be expressing a very important point as this is highly theoretical stuff. Even without knowing this stuff, most ABAPers will know that SELECT..INTO TABLE is faster than SELECT/ENDSELECT. It is just that the reasoning given for the better performance of SELECT..INTO TABLE is not technically accurate. It would be nice to know the technically correct reason and that is the reason I felt this post has to be written. In fact, I started this post with a question and that question was asked to me in an interview. No surprises – I gave the wrong answer. The interviewer then explained the correct answer; I couldn’t digest all that he said then and there. I had ponder on it for a while and then had to refer to the book by Hermann Gahm to clearly understand and organize my thoughts.
Hello folks. I recently came across a question in the ABAP Associate Level Certification Exam that is relevant to this blog post. This exam can be found at http://training.sap.com/v2/uploads/C_TAW12_70_sample_items.pdf . But for your convenience, I am pasting the relevant question here:
There are basically three types of SELECT statements: SELECT SINGLE, SELECT … ENDSELECT,
and SELECT … INTO TABLE. The SELECT …ENDSELECT statement is also known as the SELECT loop.
Below, you will find a list of comments about SELECT … ENDSELECT. Which of them are correct?
There are two correct answers to this question. Select which of the following answers are correct.
a) A work area (structure) is required for the selected rows. This work area can
be given explicitly (with INTO) or implicitly (if a TABLES declaration is used).
b) The database will transport individual rows to the database interface.
c) The database will transport blocks of rows to the database interface.
d) The system field sy-tabix counts the number of selected table rows.
Option (a) is one of the correct answers. And if you have read this far, you should recognize that option (b) is a trap! So please do not fall for this trap. Option (c) is the other correct answer.