Skip to Content
There is a rumor going around many SAP customers that SELECTS within itab loops should be eschewed in favor of FOR ALL ENTRIES IN ITAB constructions.  This rumor persists despite an explicit warning in official SAP class material circa 2004-5 concerning the use and misuse of FOR ALL ENTRIES.  Here’s why I think SAP is correct to warn folks about FOR ALL ENTRIES, and why there should be a concerted effort to disabuse clients of pipe-dreams concerning this construct.  When one does a “FOR ALL ENTRIES IN itab” SELECT, one is really doing a bastardized join which relies on the skill of SAP developers rather than the skill of Oracle or DB2 developers.  Why? Because half the raw stuff of the join is in the database (buffering considerations aside) and half is in ABAP or SAP memory (I don’t recall which, to be honest.)  Now I am second to none in my respect for SAP developers, but no one is gonna convince me that SAP developers can do a join better than Oracle or DB2 internals.  (This is not to say that Oracle and DB2 joins are anything to write home about – the only reason Codd is still spoken about in hushed and reverent tones is because the machines finally got sophisticated enough for DB2 and Oracle to implement parallelism.  Otherwise, the relational paradigm would have already been consigned to the dustbin of history as the truly bad idea it really was.)  Am I suggesting that developers use explicit DB joins instead of FOR ALL ENTRIES?  To the contrary – I am arguing that SELECTS within LOOPS on itabs are still generally preferable when the number of record numbers is anything but trivially small, particularly when your SELECT within the LOOP can take advantage of explicit databse indices..  Your friendly DBAs will argue that the repetitive hits of the SELECTs in the iterated loop passes create a strain on the database.  Well this may well be so but if it is, this is a problem in database implementation, not in coding practice.  What, if you think about it, is the difference between one customer requesting repeating SELECTs and hundreds of customers requesting one SELECT each?  If a DB can handle the latter reasonably well, shouldn’t it be able to handle the former equally well?  So my question is: does SAP or anyone else have any real benchmark stats on FOR ALL ENTRIES vs indexed SELECTS within loops – stats for different numbers of records in the driver itabs?  And my final comment in this post is: “Be careful of the paradigm you ask for – you may get it.”   More on this thought later.
To report this post you need to login first.

14 Comments

You must be Logged on to comment or reply to a post.

  1. Community User
    Hi,

    You said that:

    “When one does a FOR ALL ENTRIES IN itab SELECT, one is really doing a bastardized join which relies on the skill of SAP developers …because half the raw stuff of the join is in the database (buffering considerations aside) and half is in ABAP or SAP memory (I don’t recall which, to be honest.)”

    My understanding is, please correct me, that there is no JOIN in this case. SAP’s database layer converts the statement as follows:

    “SELECT … FOR ALL ENTRIES IN itab WHERE cond returns the union of the solution sets of all SELECT statements that would result if you wrote a separate statement for each line of the internal table replacing the symbol itab-f with the corresponding value of component f in the WHERE condition.”.

    Hence each line of itab is appended into the WHERE clause with an OR between two lines before it is sent to the database. For the database, it’s one complete statement with a big WHERE clause. As far as performance goes, the developer should try to specify the WHERE clause as much as possible and try to hit the index.

    The only problem that we used to have in this case, and in case of select-options, was that the length of SELECT statement used to get too long which caused a dump. This problem was resolved I believe in next releases.

    Regards

    (0) 
    1. David Halitsky
      Hi Shehryar –

      Thanks very much for taking the time to reply and for the additional technical clarification.  What you wrote is exactly why I said “bastardized join”, not “genuine join” – the FOR ALL ENTRIES emulates a join in a way that may or may not be efficient (hence the SAP warning in its official class material). 

      The really interesting and critical thing is how the SAP scheduler handles the one SELECT with the multiple OR’d where clauses or the multiple selects with single where clause (assuming both are on properly indexed fields.) 

      I personally still believe that the latter (multiple selects) will be better handled by the SAP scheduler than the single select, and that if it isn’t, then there’s something wrong with the SAP scheduler.

      The old-time DB wizards wrote their schedulers so that threads were penalized for being CPU-bound as well as for being IO-bound … if the SAP scheduler is written similarly, then the single select thread should have its priority downgraded more rapidly than the multiple select thread.

      (0) 
      1. Community User
        Hi,

        Did you check out these threads:

        1. Maximum number of records for usage of “For all entries”
        2. Multiple Table Join instead of Nested Selects?
        3. query related to inner join V/S all entries……..

        It was mentioned in one of the documents about performance optimization that:

        <————————
        SELECT … FOR ALL ENTRIES

        In the outermost loop, the database table (PACKAGE SIZE) is read section-by-section into an internal table, sorted by its primary key (SORT on the internal table, or read in using ORDER BY PRIMARY KEY). For each data record in the internal table, all associated, dependent records are read into a further internal table (using SELECT … FOR ALL ENTRIES). This is also sorted. You can then carry on processing using a nested LOOP.
        The advantage of SELECT … FOR ALL ENTRIES is that it provides good performance regardless of the selectivity of the condition on the outermost table, since, in contrast to the nested SELECT, it works in a data-oriented way in the database, but still only picks out the relevant database entries (different to parallel cursor processing).
        You should use the addition FOR ALL ENTRIES if a JOIN is not possible for syntactical reasons or if the JOIN would result in high redundancy due to the constantly repeated fields from the left table.
        ————->
        Regards

        (0) 
          1. David Halitsky
            Thanks for the links to the forum threads.

            As far as the “warning” to which I referrred, I thought it was in Unit 5 of BC 400, but it’s not, so I will have to look further in the other class manuals.  Unfortunately, the index pages for these manuals are not very revealing.

            (0) 
  2. Peter Inotai
    Hi David,

    It’s an interesting topic.
    What about ask the ABAPers in SDN to make some test in their enviroment and attach the result here as it happened in this weblog:
    The specified item was not found.

    Peter

    (0) 
    1. David Halitsky
      Hi Peter –

      Thanks for taking the time to reply.

      Glad you find the question interesting, and therefore still undecided (in your mind, at least.)

      I think your suggestion is a good one, at least for some preliminary stats.  I will do two tests using Rich’s wall-time approach/code:

      1) creating an profit center itab from CEPC and then: a) driving into GLPCA with a loop on this itab containing a SELECT SINGLE on rprctr (being sure to get an indexed read by including KOKRS and RYEAR in the where and thereby invoking SAP-delivered index-1 on GLPCA); b) doing the same with a FOR ALL ENTRIES construction;

      2) creating an itab of objnr’s by: a) concatenating two constants before each kostl in csks, putting the resultant objnr’s in an itab, and then driving off this itab into COEP with a SELECT SINGLE that uses a ledger number plus objnr to force SAP-delivered index 1; b) doing the same with a FOR ALL ENTRIES construction;

      Since the first test will involve only @160 profit centers and the second will involve @2000 cost centers, the results should provide some indication of whether there’s any difference that grows noticeably with size of the driver itab.

      If not, I’ll do some grown-up tests with some selects in BSEG driving off a large itab drawn from BKPF.

      If there’s still no difference, then my original assumption/belief is probably incorrect for all intents and purposes.

      One more note – a simple wall-time test such as Rich’s will probably not answer the question to the satisfaction of a DBA.  This is because a DBA will really want to know which of the two approaches tends to create more degradation in the performance of OTHER users while the requests are running.  That is, I think it’s possible in principle for both approaches to yield roughly the same wall-times but put different levels of stress on the underlying DB.  Or – it may be possible that one approach will show better wall-time but place a strain on the DB that a DBA will find excessive.

      (0) 
    2. David Halitsky
      Peter (and others)- read ’em and weep! – there’s a drastic difference beyond my wildest expectations.

      Here’s the code (comment out one of the two methods for retrieving from glpca.)

      REPORT  Z_TIMETEST1.

      CONSTANTS:

        c_bfs(3)           TYPE c VALUE ‘BFS’,
        c_year(4)          TYPE c VALUE ‘2004’.

      TYPES:

      BEGIN OF prctr1_s,
        prctr1          TYPE prctr,
      END   OF prctr1_s,

      BEGIN OF prctr2_s,
        rprctr          TYPE prctr,
        icol            TYPE i,
      END   OF prctr2_s,

      tprctr1_s TYPE STANDARD TABLE OF prctr1_s,
      tprctr2_s TYPE STANDARD TABLE OF prctr2_s.

      DATA:

        wa_prctr1_s TYPE prctr1_s,
        wa_prctr2_s TYPE prctr2_s,
        i_prctr1    TYPE tprctr1_s,
        i_prctr2    TYPE tprctr2_s,

        rt_str type i,
        rt_end type i,
        run_time1 type p decimals 2,
        run_time2 type p decimals 2,
        v_lines type i,
        v_linesc(22) type c.

      SELECT prctr from cepc INTO wa_prctr1_s-prctr1.
        APPEND wa_prctr1_s TO i_prctr1.
      ENDSELECT.

      get run time field rt_str.

      **************************************
      *** method 1: select within loop with collect
      *LOOP AT i_prctr1 INTO wa_prctr1_s.
      *  SELECT rprctr
      *      INTO wa_prctr2_s-rprctr
      *      FROM glpca
      *     WHERE kokrs = c_bfs
      *       AND ryear = c_year
      *       AND rprctr = wa_prctr1_s-prctr1.
      *
      *     IF sy-subrc = 0.
      *       COLLECT wa_prctr2_s INTO i_prctr2.
      *     ENDIF.
      *  ENDSELECT.
      * ENDLOOP.
      **************************************

      **************************************
      *** method 2: for all entries with auto-elim of dups

      *  SELECT rprctr
      *      FROM glpca
      *      INTO CORRESPONDING FIELDS OF TABLE i_prctr2
      *      FOR ALL ENTRIES IN i_prctr1
      *     WHERE kokrs = c_bfs
      *       AND ryear = c_year
      *       AND rprctr = i_prctr1-prctr1.
      ********************************************

      get run time field rt_end.

      run_time1 = ( rt_end – rt_str ) / 1000000 .

      write:/ ‘run time:’,  run_time1, ‘seconds’.

      DESCRIBE TABLE i_prctr1 LINES v_lines.
      v_linesc = v_lines.
      WRITE: / v_linesc.

      DESCRIBE TABLE i_prctr2 LINES v_lines.
      v_linesc = v_lines.

      Here are the results for ten tries of method 1 (select within a loop with a collect to eliminate dups)

      “Results in seconds for ten tries of method 1:”
      .54
      .42
      .38
      .93
      .60
      .67
      .46
      .57
      .61
      .38

      Here are the results for ten tries of method 2 (FOR ALL ENTRIES with auto-elimination of dups):

      “Results in seconds for ten tries of method 2:”

      1.24
      3.41
      1.95
      2.23
      2.83
      6.12
      3.59
      1.79
      1.57
      2.76

      Even for a battle-weary cynic (me) who believes that SQL started a mindless rush to “neat syntax” at the expense of performance, these are surprising results.  (Well, I shouldn’t be so hard on SQL – the mindless rush actually started with the idea that one shouldn’t have labels and goto’s … but that’s a battle lost so many years ago that no one even remembers it.)

      (0) 
    3. David Halitsky
      The code below is exactly parallel to the original time test I posted, but the results are exactly reversed. The only difference I can see is that in the first example, 160 rows in table 1 were de-duplicated to 40 rows in table 2, whereas in the example below, 1179 rows in table 1 were de-duplicated to 520 rows in table 2. 

      So tomorrow I will change table 2 in the code below to a hashed table and do inserts only on read sy-subrcs of 4.  This will tell whether
      the results below are due to the time it takes to read a larger table looking for dups.  If this is not the reason for the reversed results, then I confess I’m puzzled.

      For the “FOR ALL ENTRIES” method, the timetest results for ten tries were:

      .18
      .18
      .21
      .22
      .20
      .19
      .19
      .20
      .19
      .25

      For the select out of an itab loop with a collect, the timetest results for ten tries were:

      1.79
      1.75
      1.57
      1.52
      2.07
      3.59
      3.38
      2.08
      2.77
      2.50

      Here’s the code.  I’ve marked with “!!!!!”‘s the table that I’ll change to a hashed table and the COLLECT that I’ll change to a read/insert on a hashed table.

      REPORT  Z_TIMETEST2.

      TYPES:

      BEGIN OF kostl_s,
         kostl          TYPE kostl,
      END   OF kostl_s,

      BEGIN OF belnr1_s,
        objnr           TYPE j_objnr,
        belnr           TYPE co_belnr,
        gjahr           TYPE gjahr,
        kstar           TYPE kstar,
      END   OF belnr1_s,

      BEGIN OF belnr2_s,
        belnr           TYPE co_belnr,
        icol            TYPE i,
      END   OF belnr2_s,

      tkostl_s  TYPE STANDARD TABLE OF kostl_s,
      tbelnr1_s TYPE STANDARD TABLE OF belnr1_s,
      * !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      *** this is the one I’ll hash tomorrow
      tbelnr2_s TYPE STANDARD TABLE OF belnr2_s.
      * !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

      CONSTANTS:
        c_ks(2)            TYPE c VALUE ‘KS’,
        c_kokrs_len3(3)    TYPE c VALUE ‘BFS’,
        c_lednr(2)         TYPE c VALUE ’00’.

      DATA:

        wa_kostl_s  TYPE kostl_s,
        i_kostl     TYPE tkostl_s,
        wa_belnr1_s TYPE belnr1_s,
        i_belnr1    TYPE tbelnr1_s,
        wa_belnr2_s TYPE belnr2_s,
        i_belnr2    TYPE tbelnr2_s,

        v_objnr(16)        TYPE c,
        v_kostl_len11(11)  TYPE c,

        rt_str type i,
        rt_end type i,
        run_time1 type p decimals 2,
        run_time2 type p decimals 2,
        v_lines type i,
        v_linesc(22) type c.

      SELECT DISTINCT kostl from csks INTO wa_kostl_s-kostl.
        APPEND wa_kostl_s TO i_kostl.
      ENDSELECT.

      LOOP AT i_kostl INTO wa_kostl_s.

        v_kostl_len11 = wa_kostl_s-kostl.
        SHIFT v_kostl_len11 RIGHT BY 1 PLACES.

        CONCATENATE c_ks
                    c_kokrs_len3
                    v_kostl_len11
               INTO v_objnr.

      SELECT
        SINGLE objnr
               belnr
               gjahr
               kstar
          INTO (wa_belnr1_s-objnr,
                wa_belnr1_s-belnr,
                wa_belnr1_s-gjahr,
                wa_belnr1_s-kstar)
          FROM coep
      * to get good read on index 1
         WHERE lednr = c_lednr
           AND objnr = v_objnr.

        APPEND wa_belnr1_s TO i_belnr1.
      ENDLOOP.

      get run time field rt_str.

      **************************************
      *** method 1: select within loop

      *LOOP AT i_belnr1 INTO wa_belnr1_s.
      *
      *  SELECT belnr
      *    INTO wa_belnr2_s-belnr
      *    FROM cobk
      *   WHERE kokrs = c_kokrs_len3
      *     AND belnr = wa_belnr1_s-belnr.
      *
      *    IF sy-subrc = 0.
      * !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      ** this is the collect I’ll change to a hashed read/hashed insert tomorrow.
      *      COLLECT wa_belnr2_s INTO i_belnr2.
      * !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      *    ENDIF.
      *
      *  ENDSELECT.
      *
      *ENDLOOP.

      **************************************

      **************************************
      *** method 2: for all entries

      *  SELECT belnr
      *      FROM cobk
      *      INTO CORRESPONDING FIELDS OF TABLE i_belnr2
      *      FOR ALL ENTRIES IN i_belnr1
      *     WHERE kokrs = c_kokrs_len3
      *       AND belnr = i_belnr1-belnr.
      ********************************************

      get run time field rt_end.

      run_time1 = ( rt_end – rt_str ) / 1000000 .

      write:/ ‘run time:’,  run_time1, ‘seconds’.

      DESCRIBE TABLE i_belnr1 LINES v_lines.
      v_linesc = v_lines.
      WRITE: / v_linesc.

      DESCRIBE TABLE i_belnr2 LINES v_lines.
      v_linesc = v_lines.

      (0) 
      1. Anton Wenzelhuemer
        for more valid results I’d add BYPASSING BUFFER to those SQL statements and, depending on your DB engine, add a DB hint to bypass the DB engine’s buffer as well.

        just my 2 cents

        (0) 
        1. David Halitsky
          Apples aren’t apples and oranges aren’t oranges unless they’re all rolling on a level playing field (or something like that.)

          Thanks again for the suggestions

          (0) 
      2. Peter Inotai
        Hi David,

        Thanks for this fun, I really enjoy it;-)

        I did some test on the following platform:
        OS HP-UX B.11.11     
        DB6 08.02.0002
        Kernel 640 Patch 126
        SAP R/3 Enterprise 4.70×200

        Unfortunately I could do it only on our development system, in one of our sandbox client, which doesn’t have too much data.

        My results:
        method 1 – select within loop
        0,72
        4,34
        0,59
        0,46
        2,35
        2.146,17
        0,31
        0,45
        1,97
        0,62

        method 2 – FOR ALL ENTRIES
        0,02
        2.147,48
        0,02
        0,02
        0,02
        0,02
        0,02
        0,14
        0,02
        0,02

        It’s clear, that FOR ALL ENTRIES looks better.

        I took a look on the available OSS notes, and it seems to have an optimized FOR ALL ENTRIES, the system (DB+SAP) has to be tuned.
        The relevant OSS note, which worth to check:
        Note 48230 – Parameters for the SELECT … FOR ALL ENTRIES statement
        Note 652634 – FOR ALL ENTRIES performance with Microsoft SQL Server
        Note 114716 – Performance problems Oracle 8.0.4/all entries
        Note 634263 – Selects with FOR ALL ENTRIES as of kernel 6.10
        Note 819324 – FAQ: MaxDB SQL optimization

        I’m looking forward for result in other platforms (e.g.: MsSQL)

        Peter

        (0) 
        1. David Halitsky
          Peter –

          Did it stop being fun when ST05 revealed that FOR ALL ENTRIES IN ITAB doesn’t always play nicely with the DB2 optimizer, as documented in:

          “Yes, Virginia, the check is in the mail” (or, why you can’t trust SQL)

          I’m not at all surprised at this result and it’s not really a black-eye for SAP developers. All query optimizers (Oracle’s as well as DB2’s) react in essentially unpredictable ways when confronted with OR’d lists in WHERE clauses.

          So the best one can say is that FOR ALL ENTRIES in ITAB will SOMETIMES do better, depending on the table and the way the native database optimizer chooses to operate on the table.

          To illustrate this general point about query optimizers by another example with which I’m familiar, fiscal year (ryear) and period (poper) are separated by account (racct) in Index 1 on FMUSFGFACTS1A. Anyone working in Oracle who needs data across multiple periods will find out very quickly that they need to do a nest of multiple loops to give Oracle a completely resolved index (on singletons) at the bottom of the nest.

          (0) 

Leave a Reply