Here is my thoughts about performance in the next task:

select many records from database and put results by some keys into an internal hashed table.

We can insert records into hashed table inside loop or we can sort internal table and delete duplicated records.

What is quicker?

Here is a small program to test:

REPORT  ztest_performance.

TYPES: BEGIN OF ekpo_typ,

       ebeln TYPE ebeln,

       ebelp TYPE ebelp,

       matnr TYPE matnr,

       bukrs TYPE bukrs,

       werks TYPE EWERK,

   END OF ekpo_typ.

DATA: lt_ekpo TYPE TABLE OF ekpo_typ.

DATA: lt_ekpo_h TYPE HASHED TABLE OF ekpo_typ WITH UNIQUE KEY bukrs.

DATA: lt_ekpo_2 TYPE HASHED TABLE OF  ekpo_typ WITH UNIQUE KEY bukrs.

DATA: i1 TYPE i,

       i2 TYPE i,

       i3 TYPE i,

       i4 TYPE i,

       lv_lines1 TYPE i,

       lv_lines2 TYPE i,

       diff21 TYPE i,

       diff32 TYPE i.

FIELD-SYMBOLS: <fs_ekpo> LIKE LINE OF lt_ekpo.

   SELECT ebeln

          ebelp

          matnr

          bukrs

          werks

      FROM ekpo

     INTO CORRESPONDING FIELDS OF TABLE lt_ekpo

     UP TO 1000000 ROWS.

   GET RUN TIME FIELD i1.

   LOOP AT lt_ekpo ASSIGNING <fs_ekpo>.

     INSERT <fs_ekpo> INTO TABLE lt_ekpo_h.

   ENDLOOP.

   GET RUN TIME FIELD i2.

   lv_lines1 = LINES( lt_ekpo_h ).

   REFRESH lt_ekpo_h[].

   GET RUN TIME FIELD i3.

   SORT lt_ekpo BY bukrs.

   DELETE ADJACENT DUPLICATES FROM lt_ekpo COMPARING bukrs.

   lt_ekpo_2[] = lt_ekpo.

   GET RUN TIME FIELD i4.

   lv_lines2 = LINES( lt_ekpo_2 ).

   refresh lt_ekpo_2[].

   diff21 = i2 i1.

   diff32 = i4 i3.

   WRITE: ‘i2-i1 = ‘, diff21, /.

   WRITE: ‘i4-i2 = ‘, diff32, /.

   WRITE: ‘lines1 = ‘, lv_lines1.

   WRITE: ‘lines2 = ‘, lv_lines2.


In my test system the result is:


i2-i1 =     814.957

i4-i2 =     480.459

lines =          29

So, “delete duplicated records” seems to work quiker than “insert records into hashed table inside loop“.

To report this post you need to login first.

9 Comments

You must be Logged on to comment or reply to a post.

      1. Thomas Krügl

        As long as you can write the select yourself (and don’t receive the itab from *somewhere* to work with), and do not need the detailed data anyways, a SELECT using either GROUP BY, SELECT DISTINCT, or COUNT directly will have better performance, i think no matter what indexes you have.

        Comparing only SORT +  DELETE ADJACENT DUPLICATES and the INSERT to a hash table, your findings were interesting for me, but i cannot confirm it.

        I tried this with other data, with the insert being about twice as fast as the sorting plus delete.

        Important though: The sorting takes much longer than the DELETE ADJACENT DUPLICATES. Maybe in your case the data is already sorted, or pretty much sorted. The DELETE ADJACENT DUPLICATES alone is indeed considerably faster than the INSERT method.

        (0) 
        1. Jelena Perfiljeva

          I’m with Thomas on this one – just don’t select the data you don’t need in the first place.

          If it was the “receive the itab from *somewhere* to work with” case I’d probably go with SORT/DELETE just because it’s clear, only 2 lines and does not require FS declaration.

          (0) 
  1. Richard Harper

    However,  you need to then look at what your table is being used for.  You may lose some time inserting information into a hashed table but you would certainly gain that time back when using it correctly as a lookup table.

    So personally,  I think your question doesn’t need asking.  It’s the consideration of the use of the table that you are building that needs to be thought about.

    Rich 

    (0) 
  2. Naimesh Patel

    In my experience, COLLECT works faster than any other options.

    You can read more on that – ABAP internal table unique key Performance Comparison – ABAP help blog

    Because COLLECT uses internal hash algorithm. http://help.sap.com/abapdocu_702/en/abapcollect.htm

    In standard tables that are filled using COLLECT only, the entry is determined by a temporary hash administrator. The workload is independent of the number of entries in the table. The hash administrator is temporary and is generally invalidated when the table is accessed to be changed.

    Regards,
    Naimesh Patel

    (0) 
    1. Richard Harper

      So that’s one example of table use,  another example I was thinking about was a lookup table for instance.

      However,  when you think of it, any table that has a good hashing algorithm that does not generate too many collisions generally be faster than a sorted table when the number of records start getting huge.

      (0) 

Leave a Reply