Skip to Content
Author's profile photo Kerem Koseoglu

ABAP Performance Test: Hashed Tables vs Database Buffers

There are two significant techniques to avoid repetitive access to database records: Hashed internal tables and database buffers.

If we decide to use the former method, we need to get the database record into a hashed internal table. Whenever we need to access the record, we read the internal table instead of the database table.

If we decide to use the latter method, all we need to do is to activate the table buffer in SE11. If we do that, the accessed data is cached on the application server, and the cache is used whenever possible. Thus, database access is reduced. There are exceptional cases where the buffer is bypassed, but that’s beyond the scope of this article.

Wondering how they compare in terms of performance, I created a fully buffered database table with example data.

I also wrote a program, which accesses the data using both techniques and compares the runtime.

It turns out that hashed access is 3-4 times faster than database buffering.

You can use the attached source code yourself and repeat the test; I would like to hear about any disproving results.

REPORT  zdummy01.

CLASS main DEFINITION.
  PUBLIC SECTION.
    CLASS-METHODS execute.
  PROTECTED SECTION.
  PRIVATE SECTION.

    CONSTANTS:
      c_do_times TYPE i     VALUE 100,
      c_matnr    TYPE matnr VALUE 'M1'.

    CLASS-DATA:
      go_text  TYPE REF TO cl_demo_text,
      gv_begin TYPE i.

    CLASS-METHODS:
      hash,
      select,
      start_chrono,
      stop_chrono.

ENDCLASS.                    "main DEFINITION

CLASS main IMPLEMENTATION.

  METHOD execute.

    go_text = cl_demo_text=>get_handle( ).

    start_chrono( ).
    hash( ).
    stop_chrono( ).

    start_chrono( ).
    select( ).
    stop_chrono( ).

    go_text->display( ).

  ENDMETHOD.                    "execute

  METHOD hash.

    DATA:
      lt_dummy
        TYPE HASHED TABLE OF zdummy01
        WITH UNIQUE KEY primary_key COMPONENTS matnr.

    FIELD-SYMBOLS:
      <ls_dummy> LIKE LINE OF lt_dummy.

    go_text->add_line( 'Starting hash read' ).

    SELECT * INTO TABLE lt_dummy FROM zdummy01.

    DO c_do_times TIMES.

      READ TABLE lt_dummy
        ASSIGNING <ls_dummy>
        WITH TABLE KEY primary_key
        COMPONENTS matnr = c_matnr.

    ENDDO.

  ENDMETHOD.                    "hash

  METHOD select.

    DATA ls_dummy TYPE zdummy01.

    go_text->add_line( 'Starting select' ).

    DO c_do_times TIMES.

      SELECT SINGLE *
        INTO ls_dummy
        FROM zdummy01
        WHERE matnr EQ c_matnr.

    ENDDO.

  ENDMETHOD.                    "select

  METHOD start_chrono.
    GET RUN TIME FIELD gv_begin.
  ENDMETHOD.                    "start_chrono

  METHOD stop_chrono.

    DATA:
      lv_diff  TYPE i,
      lv_difft TYPE cl_demo_text=>t_line,
      lv_end   TYPE i.

    GET RUN TIME FIELD lv_end.
    lv_diff = lv_end - gv_begin.
    WRITE lv_diff TO lv_difft LEFT-JUSTIFIED.

    go_text->add_line(:
      'Runtime result:' ),
      lv_difft ).


  ENDMETHOD.                    "stop_chrono

ENDCLASS.                    "main IMPLEMENTATION

START-OF-SELECTION.
  main=>execute( ).

Assigned Tags

      17 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Gerrit Beukema
      Gerrit Beukema

      An interesting question to ask. And an interesting result as well. For small lookup tables I would always just simply use the database buffered approach. But with your results I might need to change this behavior. Just saying that the hash method is three times faster is a bit too simple though.

      I wonder what the results would be for subsequent runs of the program. The run time for the hashed table should remain the same but I suspect the one for the database buffer to go down since the buffer does not need to be filled again.

      To be really fair the SELECT in hash() should really have BYPASSING BUFFER added.

      I am working on a blog to compare the performance between a hashed table buffer and a sorted table buffer. Your blog has given me new inspiration to finish it. Thanks!

      Author's profile photo Kerem Koseoglu
      Kerem Koseoglu
      Blog Post Author

      Gerrit, thanks for your reply! I recommend reading the reply of Bilen Cekic just below yours, he really widened up my humble test and compared hashed & sorted buffers.

      Author's profile photo Bilen Cekic
      Bilen Cekic

      Thank you Kerem for sharing, great blog! I tested on my HANA system and results are interesting, my table has 81 rows with only one primary key. Table also contains additional 6 columns.

       

      This is my first test with the same code without doing any changes; [BUFFERING IS NOT ACTIVATED YET]

      Starting hash read
      Runtime result:
      5.896
      Starting select
      Runtime result:
      90.742
      
      

      First i added a sorted table with uniqe key;

        DATA:
            lt_dummy
              TYPE SORTED TABLE OF ZTEST
              WITH UNIQUE KEY primary_key COMPONENTS DEV_ID.

      Here is the result;

      Starting hash read
      Runtime result:
      6.003
      Starting sorted read
      Runtime result:
      1.721
      Starting select
      Runtime result:
      86.529
      

       

      Now i added a secondary index with non-unique sorted key;

      DATA:
            lt_dummy
              TYPE STANDARD TABLE  OF ZTEST
              WITH NON-UNIQUE SORTED KEY key1 COMPONENTS DEV_ID.

       

      here is the result with secondary index;

      Starting hash read
      Runtime result:
      7.532
      Starting sorted read
      Runtime result:
      2.302
      Starting sorted secondary index read
      Runtime result:
      1.901
      Starting select
      Runtime result:
      92.297

      now i activated fully TABLE buffering only; [ for the select statements for hash, sorted and sorted_secondary methods from table, i am bypassing buffering to be fair ]

      Here is the result;

      After buffering, first RUN
      
      Starting hash read
      Runtime result:
      6.049
      Starting sorted read
      Runtime result:
      1.783
      Starting sorted secondary index read
      Runtime result:
      1.523
      Starting select
      Runtime result:
      16.972

      and i re-run the program, ( i think first run moved the records to the shared memory area that is why it took 16.972)

      Here is the result after re-run;

      After buffering SECOND RUN 
      Starting hash read
      Runtime result:
      4.684
      Starting sorted read
      Runtime result:
      1.960
      Starting sorted secondary index read
      Runtime result:
      1.632
      Starting select
      Runtime result:
      2.365

      i updated one of the record details from table and here is the result;

      Starting hash read
      Runtime result:
      5.947
      Starting sorted read
      Runtime result:
      1.956
      Starting sorted secondary index read
      Runtime result:
      1.760
      Starting select
      Runtime result:
      9.551

      and re-run results of same records;

      Starting hash read
      Runtime result:
      5.145
      Starting sorted read
      Runtime result:
      1.943
      Starting sorted secondary index read
      Runtime result:
      1.741
      Starting select
      Runtime result:
      2.120

       

      If we change the order of methods, results are changing as well. Somehow first select statement always slow even you bypass buffering; 

      I have another test here, a dummy select was added to the execute method just to select the table 1 time, i think on the backend there is another shared area handled by DB layer that is why first select was always slow as per HASH read results;

      Here is the code;

       method execute.
          data: lt_test type table of ztest.
          go_text = cl_demo_text=>get_handle( ).
      
      * dummy select
          select * into table lt_test from ztest bypassing buffer.
      
          start_chrono( ).
          hash( ).
          stop_chrono( ).
      
          start_chrono( ).
          sorted( ).
          stop_chrono( ).
      
          start_chrono( ).
          sorted_secondary( ).
          stop_chrono( ).
      
          start_chrono( ).
          select( ).
          stop_chrono( ).
      
          go_text->display( ).
      
        endmethod.                    "execute
      

      and results:

      Starting hash read
      Runtime result:
      1.834
      Starting sorted read
      Runtime result:
      1.630
      Starting sorted secondary index read
      Runtime result:
      1.678
      Starting select
      Runtime result:
      2.043

       

      Interesting results ?

      Thank you again for sharing. I have never used buffering on my developments but above results gave me some ideas.

      Author's profile photo Kerem Koseoglu
      Kerem Koseoglu
      Blog Post Author

      Bilen, thanks for the great effort and reply! Your participation certainly widened up the scope of my humble test.

      This reminds me of the time where you repeated our nested loop performance tests and got varying results depending on the sample dataset. For the record; that particular article is banned from blogs.sap.com for some reason, but it's still available on WordPress.

      Coming back to our subject; it looks like we all can agree that performance is a contingent subject. One should be aware of alternative approaches, and pick the appropriate one.

      Once again, thanks for the time you put in!

      Author's profile photo Sandra Rossi
      Sandra Rossi

      For whom are interested, here is more official information about the concept here:

      SAP Library: Buffering Database Tables

      ABAP Documentation: SAP Buffering of Database Tables

       

       

       

      Author's profile photo Bikas Tarway
      Bikas Tarway

      Testing with less data will always go in favor of buffered table. But once data volume increases ( 25K and above ) Hashed table will give better performance.

       

      Regards

       

      Author's profile photo Jelena Perfiljeva
      Jelena Perfiljeva

      Don't know about HANA but in ECC tables with 25K records would not be setup as buffered. This blog is rather misleading IMHO (see my comment for explanation).

      Author's profile photo Jelena Perfiljeva
      Jelena Perfiljeva

      Maybe I'm missing something but IMHO these are not two techniques to achieve the same results. These work completely differently, serve different purpose for different use cases.

      It's like saying "there are two tools: a screwdriver and a hammer". Both are useful but they're just completely different things.

      As others mentioned, DB buffer option is used for small tables that are frequently accessed. For example, we rarely create new company codes but company information is used many, many times by all kinds of transactions. So all those transactions would benefit from buffering.

      But hashed table exists only in memory during runtime of a particular program. How is this comparable?

      Author's profile photo Lars Breddemann
      Lars Breddemann

      This comment literally took the words off my keyboard. Thanks for that.

       

      Author's profile photo Lars Breddemann
      Lars Breddemann

      Re-reading this blog post and the other comments it is really striking how the different qualities of the alternatives are ignored.

      With the internal table approach, one gets a private copy of the data and needs to take care of any other process changing the same set of data. In effect, one would have to handle synchronisation, potentially across a whole NetWeaver server landscape manually.

      The buffered tables, on the other hand, do that "automagically" for you. It's a central decision to buffer the table (and how to buffer) it, and all code using the table can benefit from it, without any code changes. The ABAP code also does not need to worry about changed data or synchronisation issues, as the DBI layer handles this.

      Jelena is spot on with her remarks. Simply putting a single aspect of these two data structures (reading data) next to each other without further qualification is misleading and over-simplifying the kind of problem each of them solves.

      Author's profile photo Shai Sinai
      Shai Sinai

      Exactly what I thought.

      One more point: DB Buffering supports performance tuning in DB level (most of the times application server shouldn't even call to DB server, what may reduce heavy loads from it) and less in application level.

      Author's profile photo Bilen Cekic
      Bilen Cekic

      “As others mentioned, DB buffer option is used for small tables that are frequently accessed. For example, we rarely create new company codes but company information is used many, many times by all kinds of transactions. So all those transactions would benefit from buffering.”

       

      i am sorry but this statement from 1995. That time with 32bit architecture, you have limited memory capabilities. I don’t know but people keep repeating same statement. Whenever you open a topic about buffering, “bro you have to buffer only T* tables”, yea that is totally wrong.

      Best way to understand to buffer or not to buffer, you can check ST10 tcode and check how you access table and how frequently it is changing. I am buffering a master data table with 50K rows and 25 columns without any issue and really benefiting from it.

       

       

       

       

      Author's profile photo Jelena Perfiljeva
      Jelena Perfiljeva

      And you waited for 6 months to bring this up? 🙂

      Small or not small, table buffering has nothing to do with hashed tables whatsoever. End of story.

      Author's profile photo Bilen Cekic
      Bilen Cekic

      😀 alright, i will answer 6 months later.

      Author's profile photo Davis Krastins
      Davis Krastins

      I disagree. Both are data buffering methods although very different in their nature.

      For example - If you have buffered the table, you should probably think twice before implementing hashed (or sorted) buffer in ABAP internal table for the same table.

      It is also useful to know, that access to buffer implemented in ABAP Internal table is faster than access to table buffer in application server.

      Author's profile photo Jelena Perfiljeva
      Jelena Perfiljeva

      I'm rather confused why this comment is causing such heated debate spanning over 2 years. 🙂 I stand by what I said before. It seems you're trying to argue solely on the base of semantics and I fail to understand why.

      There is no point in stretching the established terminology simply to justify verbiage in an SCN blog. Hashed internal tables and buffered DB tables are two different terms for different things in SAP. You want to call both by the same name - fine but then don't expect others to understand. ¯\_(ツ)_/¯

      And sorry, I don't see any valid factual arguments in this reply. "Think twice before implementing buffer" - why?

       

       

      Author's profile photo Taryck BENSIALI
      Taryck BENSIALI

      Hi All,

       

      It's also a question of volume (number of entries) in the table.

      Please find a comparaison program of HASH Tables and Sorted Tables (made a very long time ago) with binary search.

       

      or this blog