Skip to Content
Author's profile photo Former Member

Performance when working with internal table with results in a hashed table: “insert into hashed table inside loop” or “sort internal table and delete adjusted duplicated”?

Here is my thoughts about performance in the next task:

select many records from database and put results by some keys into an internal hashed table.

We can insert records into hashed table inside loop or we can sort internal table and delete duplicated records.

What is quicker?

Here is a small program to test:

REPORT  ztest_performance.

TYPES: BEGIN OF ekpo_typ,

       ebeln TYPE ebeln,

       ebelp TYPE ebelp,

       matnr TYPE matnr,

       bukrs TYPE bukrs,

       werks TYPE EWERK,

   END OF ekpo_typ.

DATA: lt_ekpo TYPE TABLE OF ekpo_typ.

DATA: lt_ekpo_h TYPE HASHED TABLE OF ekpo_typ WITH UNIQUE KEY bukrs.

DATA: lt_ekpo_2 TYPE HASHED TABLE OF  ekpo_typ WITH UNIQUE KEY bukrs.

DATA: i1 TYPE i,

       i2 TYPE i,

       i3 TYPE i,

       i4 TYPE i,

       lv_lines1 TYPE i,

       lv_lines2 TYPE i,

       diff21 TYPE i,

       diff32 TYPE i.

FIELD-SYMBOLS: <fs_ekpo> LIKE LINE OF lt_ekpo.

   SELECT ebeln

          ebelp

          matnr

          bukrs

          werks

      FROM ekpo

     INTO CORRESPONDING FIELDS OF TABLE lt_ekpo

     UP TO 1000000 ROWS.

   GET RUN TIME FIELD i1.

   LOOP AT lt_ekpo ASSIGNING <fs_ekpo>.

     INSERT <fs_ekpo> INTO TABLE lt_ekpo_h.

   ENDLOOP.

   GET RUN TIME FIELD i2.

   lv_lines1 = LINES( lt_ekpo_h ).

   REFRESH lt_ekpo_h[].

   GET RUN TIME FIELD i3.

   SORT lt_ekpo BY bukrs.

   DELETE ADJACENT DUPLICATES FROM lt_ekpo COMPARING bukrs.

   lt_ekpo_2[] = lt_ekpo.

   GET RUN TIME FIELD i4.

   lv_lines2 = LINES( lt_ekpo_2 ).

   refresh lt_ekpo_2[].

   diff21 = i2 i1.

   diff32 = i4 i3.

   WRITE: ‘i2-i1 = ‘, diff21, /.

   WRITE: ‘i4-i2 = ‘, diff32, /.

   WRITE: ‘lines1 = ‘, lv_lines1.

   WRITE: ‘lines2 = ‘, lv_lines2.


In my test system the result is:


i2-i1 =     814.957

i4-i2 =     480.459

lines =          29

So, “delete duplicated records” seems to work quiker than “insert records into hashed table inside loop“.

Assigned Tags

      9 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Thomas Krügl
      Thomas Krügl

      Sounds like you may want to use GROUP BY in your SELECT.

      Author's profile photo Former Member
      Former Member
      Blog Post Author

      I got ekko table only for example of big table. In common case the key fields that I need don't have indexes in db table.

      Author's profile photo Thomas Krügl
      Thomas Krügl

      As long as you can write the select yourself (and don't receive the itab from *somewhere* to work with), and do not need the detailed data anyways, a SELECT using either GROUP BY, SELECT DISTINCT, or COUNT directly will have better performance, i think no matter what indexes you have.

      Comparing only SORT +  DELETE ADJACENT DUPLICATES and the INSERT to a hash table, your findings were interesting for me, but i cannot confirm it.

      I tried this with other data, with the insert being about twice as fast as the sorting plus delete.

      Important though: The sorting takes much longer than the DELETE ADJACENT DUPLICATES. Maybe in your case the data is already sorted, or pretty much sorted. The DELETE ADJACENT DUPLICATES alone is indeed considerably faster than the INSERT method.

      Author's profile photo Jelena Perfiljeva
      Jelena Perfiljeva

      I'm with Thomas on this one - just don't select the data you don't need in the first place.

      If it was the "receive the itab from *somewhere* to work with" case I'd probably go with SORT/DELETE just because it's clear, only 2 lines and does not require FS declaration.

      Author's profile photo Former Member
      Former Member

      One thing you should take into account is Releasing Internal Table Memory.

      Author's profile photo Richard Harper
      Richard Harper

      However,  you need to then look at what your table is being used for.  You may lose some time inserting information into a hashed table but you would certainly gain that time back when using it correctly as a lookup table.

      So personally,  I think your question doesn't need asking.  It's the consideration of the use of the table that you are building that needs to be thought about.

      Rich 

      Author's profile photo Naimesh Patel
      Naimesh Patel

      In my experience, COLLECT works faster than any other options.

      You can read more on that - ABAP internal table unique key Performance Comparison - ABAP help blog

      Because COLLECT uses internal hash algorithm. http://help.sap.com/abapdocu_702/en/abapcollect.htm

      In standard tables that are filled using COLLECT only, the entry is determined by a temporary hash administrator. The workload is independent of the number of entries in the table. The hash administrator is temporary and is generally invalidated when the table is accessed to be changed.

      Regards,
      Naimesh Patel

      Author's profile photo Richard Harper
      Richard Harper

      So that's one example of table use,  another example I was thinking about was a lookup table for instance.

      However,  when you think of it, any table that has a good hashing algorithm that does not generate too many collisions generally be faster than a sorted table when the number of records start getting huge.

      Author's profile photo Former Member
      Former Member
      Blog Post Author

      Collect has some restrictions: All components that are not part of the primary table key must have a numeric data type.