ABAP Performance Test: Hashed Tables vs Database Buffers
There are two significant techniques to avoid repetitive access to database records: Hashed internal tables and database buffers.
If we decide to use the former method, we need to get the database record into a hashed internal table. Whenever we need to access the record, we read the internal table instead of the database table.
If we decide to use the latter method, all we need to do is to activate the table buffer in SE11. If we do that, the accessed data is cached on the application server, and the cache is used whenever possible. Thus, database access is reduced. There are exceptional cases where the buffer is bypassed, but that’s beyond the scope of this article.
Wondering how they compare in terms of performance, I created a fully buffered database table with example data.
I also wrote a program, which accesses the data using both techniques and compares the runtime.
It turns out that hashed access is 3-4 times faster than database buffering.
You can use the attached source code yourself and repeat the test; I would like to hear about any disproving results.
REPORT zdummy01.
CLASS main DEFINITION.
PUBLIC SECTION.
CLASS-METHODS execute.
PROTECTED SECTION.
PRIVATE SECTION.
CONSTANTS:
c_do_times TYPE i VALUE 100,
c_matnr TYPE matnr VALUE 'M1'.
CLASS-DATA:
go_text TYPE REF TO cl_demo_text,
gv_begin TYPE i.
CLASS-METHODS:
hash,
select,
start_chrono,
stop_chrono.
ENDCLASS. "main DEFINITION
CLASS main IMPLEMENTATION.
METHOD execute.
go_text = cl_demo_text=>get_handle( ).
start_chrono( ).
hash( ).
stop_chrono( ).
start_chrono( ).
select( ).
stop_chrono( ).
go_text->display( ).
ENDMETHOD. "execute
METHOD hash.
DATA:
lt_dummy
TYPE HASHED TABLE OF zdummy01
WITH UNIQUE KEY primary_key COMPONENTS matnr.
FIELD-SYMBOLS:
<ls_dummy> LIKE LINE OF lt_dummy.
go_text->add_line( 'Starting hash read' ).
SELECT * INTO TABLE lt_dummy FROM zdummy01.
DO c_do_times TIMES.
READ TABLE lt_dummy
ASSIGNING <ls_dummy>
WITH TABLE KEY primary_key
COMPONENTS matnr = c_matnr.
ENDDO.
ENDMETHOD. "hash
METHOD select.
DATA ls_dummy TYPE zdummy01.
go_text->add_line( 'Starting select' ).
DO c_do_times TIMES.
SELECT SINGLE *
INTO ls_dummy
FROM zdummy01
WHERE matnr EQ c_matnr.
ENDDO.
ENDMETHOD. "select
METHOD start_chrono.
GET RUN TIME FIELD gv_begin.
ENDMETHOD. "start_chrono
METHOD stop_chrono.
DATA:
lv_diff TYPE i,
lv_difft TYPE cl_demo_text=>t_line,
lv_end TYPE i.
GET RUN TIME FIELD lv_end.
lv_diff = lv_end - gv_begin.
WRITE lv_diff TO lv_difft LEFT-JUSTIFIED.
go_text->add_line(:
'Runtime result:' ),
lv_difft ).
ENDMETHOD. "stop_chrono
ENDCLASS. "main IMPLEMENTATION
START-OF-SELECTION.
main=>execute( ).
An interesting question to ask. And an interesting result as well. For small lookup tables I would always just simply use the database buffered approach. But with your results I might need to change this behavior. Just saying that the hash method is three times faster is a bit too simple though.
I wonder what the results would be for subsequent runs of the program. The run time for the hashed table should remain the same but I suspect the one for the database buffer to go down since the buffer does not need to be filled again.
To be really fair the SELECT in hash() should really have BYPASSING BUFFER added.
I am working on a blog to compare the performance between a hashed table buffer and a sorted table buffer. Your blog has given me new inspiration to finish it. Thanks!
Gerrit, thanks for your reply! I recommend reading the reply of Bilen Cekic just below yours, he really widened up my humble test and compared hashed & sorted buffers.
Thank you Kerem for sharing, great blog! I tested on my HANA system and results are interesting, my table has 81 rows with only one primary key. Table also contains additional 6 columns.
This is my first test with the same code without doing any changes; [BUFFERING IS NOT ACTIVATED YET]
First i added a sorted table with uniqe key;
Here is the result;
Now i added a secondary index with non-unique sorted key;
here is the result with secondary index;
now i activated fully TABLE buffering only; [ for the select statements for hash, sorted and sorted_secondary methods from table, i am bypassing buffering to be fair ]
Here is the result;
and i re-run the program, ( i think first run moved the records to the shared memory area that is why it took 16.972)
Here is the result after re-run;
i updated one of the record details from table and here is the result;
and re-run results of same records;
If we change the order of methods, results are changing as well. Somehow first select statement always slow even you bypass buffering;
I have another test here, a dummy select was added to the execute method just to select the table 1 time, i think on the backend there is another shared area handled by DB layer that is why first select was always slow as per HASH read results;
Here is the code;
and results:
Interesting results
Thank you again for sharing. I have never used buffering on my developments but above results gave me some ideas.
Bilen, thanks for the great effort and reply! Your participation certainly widened up the scope of my humble test.
This reminds me of the time where you repeated our nested loop performance tests and got varying results depending on the sample dataset. For the record; that particular article is banned from blogs.sap.com for some reason, but it's still available on WordPress.
Coming back to our subject; it looks like we all can agree that performance is a contingent subject. One should be aware of alternative approaches, and pick the appropriate one.
Once again, thanks for the time you put in!
For whom are interested, here is more official information about the concept here:
SAP Library: Buffering Database Tables
ABAP Documentation: SAP Buffering of Database Tables
Testing with less data will always go in favor of buffered table. But once data volume increases ( 25K and above ) Hashed table will give better performance.
Regards
Don't know about HANA but in ECC tables with 25K records would not be setup as buffered. This blog is rather misleading IMHO (see my comment for explanation).
Maybe I'm missing something but IMHO these are not two techniques to achieve the same results. These work completely differently, serve different purpose for different use cases.
It's like saying "there are two tools: a screwdriver and a hammer". Both are useful but they're just completely different things.
As others mentioned, DB buffer option is used for small tables that are frequently accessed. For example, we rarely create new company codes but company information is used many, many times by all kinds of transactions. So all those transactions would benefit from buffering.
But hashed table exists only in memory during runtime of a particular program. How is this comparable?
This comment literally took the words off my keyboard. Thanks for that.
Re-reading this blog post and the other comments it is really striking how the different qualities of the alternatives are ignored.
With the internal table approach, one gets a private copy of the data and needs to take care of any other process changing the same set of data. In effect, one would have to handle synchronisation, potentially across a whole NetWeaver server landscape manually.
The buffered tables, on the other hand, do that "automagically" for you. It's a central decision to buffer the table (and how to buffer) it, and all code using the table can benefit from it, without any code changes. The ABAP code also does not need to worry about changed data or synchronisation issues, as the DBI layer handles this.
Jelena is spot on with her remarks. Simply putting a single aspect of these two data structures (reading data) next to each other without further qualification is misleading and over-simplifying the kind of problem each of them solves.
Exactly what I thought.
One more point: DB Buffering supports performance tuning in DB level (most of the times application server shouldn't even call to DB server, what may reduce heavy loads from it) and less in application level.
“As others mentioned, DB buffer option is used for small tables that are frequently accessed. For example, we rarely create new company codes but company information is used many, many times by all kinds of transactions. So all those transactions would benefit from buffering.”
i am sorry but this statement from 1995. That time with 32bit architecture, you have limited memory capabilities. I don’t know but people keep repeating same statement. Whenever you open a topic about buffering, “bro you have to buffer only T* tables”, yea that is totally wrong.
Best way to understand to buffer or not to buffer, you can check ST10 tcode and check how you access table and how frequently it is changing. I am buffering a master data table with 50K rows and 25 columns without any issue and really benefiting from it.
And you waited for 6 months to bring this up? 🙂
Small or not small, table buffering has nothing to do with hashed tables whatsoever. End of story.
😀 alright, i will answer 6 months later.
I disagree. Both are data buffering methods although very different in their nature.
For example - If you have buffered the table, you should probably think twice before implementing hashed (or sorted) buffer in ABAP internal table for the same table.
It is also useful to know, that access to buffer implemented in ABAP Internal table is faster than access to table buffer in application server.
I'm rather confused why this comment is causing such heated debate spanning over 2 years. 🙂 I stand by what I said before. It seems you're trying to argue solely on the base of semantics and I fail to understand why.
There is no point in stretching the established terminology simply to justify verbiage in an SCN blog. Hashed internal tables and buffered DB tables are two different terms for different things in SAP. You want to call both by the same name - fine but then don't expect others to understand. ¯\_(ツ)_/¯
And sorry, I don't see any valid factual arguments in this reply. "Think twice before implementing buffer" - why?
Hi All,
It's also a question of volume (number of entries) in the table.
Please find a comparaison program of HASH Tables and Sorted Tables (made a very long time ago) with binary search.
or this blog