JOINS vs. FOR ALL ENTRIES - Which Performs Better?

Former Member · ‎03-19-2007

In earlier blogs, I looked at various performance tuning techniques and tried to identify some that are more important than others. In this blog, I want to look at different ways to construct a simple SELECT statement.

A very common question that is asked in the ABAP forum is “Which is better: a JOIN or FOR ALL ENTRIES ?” I’ve written a program that compares six different ways of constructing a SELECT statement: a simple, fully qualified SELECT ; a nested SELECT ; a SELECT using FOR ALL ENTRIES ; a SELECT using an INNER JOIN ; a SELECT using an OUTER JOIN ; and a SELECT using a sub-query. All of these SELECT s are fully qualified in the sense that they use all fields of the primary key. For comparison, I’ve also added a SELECT that doesn’t fully use the primary key.

This task was made more difficult by the fact that it’s not really easy to compare a JOIN with a sub-query. A JOIN assumes that you want the data from more than one table. A sub-query assumes that you need data only from the main table. So the SELECT statements that I have constructed are quite simple and in some cases, not practical. They are just for comparison purposes.

I’ve used the GET RUN TIME statement for comparison rather than the EXPLAIN function of transaction ST05 because it’s difficult to compare multiple SELECT s with single SELECT s using this function. GET RUN TIME is not perfect either, but if you do multiple comparisons, particularly in a system with little activity, the results should be OK. I’ve put all of the SELECTs used in comparisons within loops. You can adjust the number of loop passes on the selection screen.

In any event, here is the program:

REPORT ztest_selects LINE-SIZE 80 MESSAGE-ID 00.DATA: t001      TYPE t001,      bkpf      TYPE bkpf.SELECT-OPTIONS: s_bukrs FOR  bkpf-bukrs MEMORY ID buk      OBLIGATORY.SELECT-OPTIONS: s_belnr FOR  bkpf-belnr MEMORY ID bln      OBLIGATORY.PARAMETERS:     p_gjahr LIKE bkpf-gjahr MEMORY ID gjr      OBLIGATORY.SELECTION-SCREEN ULINE.PARAMETERS:     p_loop1 TYPE i                             OBLIGATORY                        DEFAULT 5,                p_loop2 TYPE i                             OBLIGATORY                        DEFAULT 10.TYPES: BEGIN OF t001_type,         bukrs  TYPE t001-bukrs,       END   OF t001_type,       BEGIN OF bkpf_type,         bukrs  TYPE bkpf-bukrs,         belnr  TYPE bkpf-belnr,         gjahr  TYPE bkpf-gjahr,       END   OF bkpf_type.DATA: t001_int  TYPE TABLE OF t001_type,      t001_wa   TYPE          t001_type,      bkpf_int  TYPE TABLE OF bkpf_type,      bkpf_wa   TYPE          bkpf_type.DATA: start     TYPE i,      end       TYPE i,      dif       TYPE i.START-OF-SELECTION.  DO p_loop1 TIMES.    PERFORM simple_select.    PERFORM nested_select.    PERFORM for_all_entries.    PERFORM inner_join.    PERFORM outer_join.    PERFORM sub_query.    PERFORM unqualified_select.    SKIP 1.  ENDDO.*&---------------------------------------------------------------------**&      Form  simple_select*&---------------------------------------------------------------------** First we get documents using a select statement that is* fully qualified on the primary key. Because buffering may be an issue,* the first select will be disregarded in this test. However, in real* life, this would be the important time.*----------------------------------------------------------------------*FORM simple_select.* Do an initial select of the documents we intend to get. Due to* buffering, the first select may take much longer then the next one.  SELECT  bukrs belnr gjahr    FROM  bkpf    INTO  TABLE bkpf_int    WHERE bukrs IN s_bukrs      AND belnr IN s_belnr      AND gjahr EQ p_gjahr.  IF sy-subrc <> 0.    MESSAGE ID '00' TYPE 'E' NUMBER '001' WITH               'No Data meets selection criteria'.  ENDIF.* Next we get the same document using the same fully qualified select* statement. We will use this in comparisons.  GET RUN TIME FIELD start.  DO p_loop2 TIMES.    SELECT  bukrs belnr gjahr      FROM  bkpf      INTO  TABLE bkpf_int      WHERE bukrs IN s_bukrs        AND belnr IN s_belnr        AND gjahr EQ p_gjahr.  ENDDO.  GET RUN TIME FIELD end.  dif = end - start.  WRITE: /001 'Time for first SELECT   (fully qualified)',          055 ':', dif, 'microseconds'.ENDFORM.                    " simple_select*&---------------------------------------------------------------------**&      Form  nested_select*&---------------------------------------------------------------------**       text*----------------------------------------------------------------------*FORM nested_select.* Use the same fully qualified SELECT, but this time nested. As usual,* Ignore the first SELECT and use the subsequent ones for comparison.  REFRESH: bkpf_int.  SELECT  bukrs FROM t001    INTO  t001_wa    WHERE bukrs IN s_bukrs.    SELECT  bukrs belnr gjahr      FROM  bkpf      INTO  bkpf_wa      WHERE bukrs EQ t001_wa-bukrs        AND belnr IN s_belnr        AND gjahr EQ p_gjahr.      APPEND bkpf_wa TO bkpf_int.    ENDSELECT.  ENDSELECT.* Next we get the same document using the same fully qualified select* statement. We will use this in comparisons.  GET RUN TIME FIELD start.  DO p_loop2 TIMES.    REFRESH: bkpf_int.    SELECT  bukrs FROM t001      INTO  t001_wa      WHERE bukrs IN s_bukrs.      SELECT  bukrs belnr gjahr        FROM  bkpf        INTO  bkpf_wa        WHERE bukrs EQ t001_wa-bukrs          AND belnr IN s_belnr          AND gjahr EQ p_gjahr.        APPEND bkpf_wa TO bkpf_int.      ENDSELECT.    ENDSELECT.  ENDDO.  GET RUN TIME FIELD end.  dif = end - start.  WRITE: /001 'Time for second SELECT  (nested)',          055 ':', dif, 'microseconds'.ENDFORM.                    " nested_select*&---------------------------------------------------------------------**&      Form  for_all_entries*&---------------------------------------------------------------------**       text*----------------------------------------------------------------------*FORM for_all_entries .* Use the same fully qualified SELECT, but this time with* FOR ALL ENTRIES.* Ignore the first set of SELECTs  SELECT  bukrs    FROM  t001    INTO  TABLE t001_int    WHERE bukrs IN s_bukrs.  SELECT  bukrs belnr gjahr    FROM  bkpf    INTO TABLE bkpf_int    FOR ALL ENTRIES IN t001_int    WHERE bukrs EQ t001_int-bukrs      AND belnr IN s_belnr      AND gjahr EQ p_gjahr.* Use these SELECTs in comparisons.  GET RUN TIME FIELD start.  DO p_loop2 TIMES.    SELECT  bukrs      FROM  t001      INTO  TABLE t001_int      WHERE bukrs IN s_bukrs.    SELECT  bukrs belnr gjahr      FROM  bkpf      INTO TABLE bkpf_int      FOR ALL ENTRIES IN t001_int      WHERE bukrs EQ t001_int-bukrs        AND belnr IN s_belnr        AND gjahr EQ p_gjahr.  ENDDO.  GET RUN TIME FIELD end.  dif = end - start.  WRITE: /001 'Time for third SELECT   (using FOR ALL ENTRIES)',          055 ':', dif, 'microseconds'.ENDFORM.                    " for_all_entries*&---------------------------------------------------------------------**&      Form  inner_join*&---------------------------------------------------------------------**       text*----------------------------------------------------------------------*FORM inner_join .* Use the same fully qualified SELECT, but this time with an INNER JOIN.  SELECT  t001~bukrs bkpf~belnr bkpf~gjahr    FROM  bkpf      INNER JOIN t001 ON        t001~bukrs EQ bkpf~bukrs    INTO TABLE bkpf_int    WHERE t001~bukrs IN s_bukrs      AND bkpf~belnr IN s_belnr      AND bkpf~gjahr EQ p_gjahr.* Use this select in comparisons.  GET RUN TIME FIELD start.  DO p_loop2 TIMES.    SELECT  t001~bukrs bkpf~belnr bkpf~gjahr      FROM  bkpf        INNER JOIN t001 ON          t001~bukrs EQ bkpf~bukrs      INTO TABLE bkpf_int      WHERE t001~bukrs IN s_bukrs        AND bkpf~belnr IN s_belnr        AND bkpf~gjahr EQ p_gjahr.  ENDDO.  GET RUN TIME FIELD end.  dif = end - start.  WRITE: /001 'Time for fourth SELECT  (using an INNER JOIN)',          055 ':', dif, 'microseconds'.ENDFORM.                    " inner_join*&---------------------------------------------------------------------**&      Form  outer_join*&---------------------------------------------------------------------**       text*----------------------------------------------------------------------*FORM outer_join .* Use the same fully qualified SELECT, but this time with an OUTER JOIN.  SELECT  t001~bukrs bkpf~belnr bkpf~gjahr    FROM  bkpf      LEFT OUTER JOIN t001 ON        t001~bukrs EQ bkpf~bukrs    INTO TABLE bkpf_int    WHERE bkpf~bukrs IN s_bukrs      AND bkpf~belnr IN s_belnr      AND bkpf~gjahr EQ p_gjahr.* Use this select in comparisons.  GET RUN TIME FIELD start.  DO p_loop2 TIMES.    SELECT  t001~bukrs bkpf~belnr bkpf~gjahr      FROM  bkpf        LEFT OUTER JOIN t001 ON          t001~bukrs EQ bkpf~bukrs      INTO TABLE bkpf_int      WHERE bkpf~bukrs IN s_bukrs        AND bkpf~belnr IN s_belnr        AND bkpf~gjahr EQ p_gjahr.  ENDDO.  GET RUN TIME FIELD end.  dif = end - start.  WRITE: /001 'Time for fifth SELECT   (using an OUTER JOIN)',          055 ':', dif, 'microseconds'.ENDFORM.                    " outer_join*&---------------------------------------------------------------------**&      Form  sub_query*&---------------------------------------------------------------------**       text*----------------------------------------------------------------------*FORM sub_query .* And a sub-query  SELECT bukrs belnr gjahr    FROM bkpf    INTO TABLE bkpf_int    WHERE bukrs IN      ( SELECT bukrs          FROM  t001          WHERE bukrs IN s_bukrs )      AND belnr IN s_belnr      AND gjahr EQ p_gjahr.* Use this select in comparisons.  GET RUN TIME FIELD start.  DO p_loop2 TIMES.    SELECT  bukrs belnr gjahr      FROM  bkpf      INTO TABLE bkpf_int      WHERE bukrs IN        ( SELECT  bukrs            FROM  t001            WHERE bukrs IN s_bukrs )        AND belnr IN s_belnr        AND gjahr EQ p_gjahr.  ENDDO.  GET RUN TIME FIELD end.  dif = end - start.  WRITE: /001 'Time for sixth SELECT   (using a sub-query)',          055 ':', dif, 'microseconds'.ENDFORM.                    " sub_query*&---------------------------------------------------------------------**&      Form  unqualified_select*&---------------------------------------------------------------------** Compare the above results with a SELECT that is only partially* qualified.*----------------------------------------------------------------------*FORM unqualified_select.* Ignore the first SELECT  SELECT  bukrs belnr gjahr    FROM  bkpf    INTO  TABLE bkpf_int    WHERE belnr IN s_belnr      AND gjahr EQ  p_gjahr.* Use this select in comparisons.  GET RUN TIME FIELD start.  DO p_loop2 TIMES.    SELECT  bukrs belnr gjahr      FROM  bkpf      INTO  TABLE bkpf_int      WHERE belnr IN s_belnr        AND gjahr EQ  p_gjahr.  ENDDO.  GET RUN TIME FIELD end.  dif = end - start.  WRITE: /001 'Time for seventh SELECT (partially qualified)',          055 ':', dif, 'microseconds'.ENDFORM.                    " unqualified_select

The program has two SELECT-OPTIONS and one PARAMETER for selecting data: Company code, document number and fiscal year. I ran it four different ways: with a single company code and document number, with a single company code and a range of document numbers, with a range of company codes and a single document number and with ranges of both company codes and document numbers.

I ran the program a number of times in a 4.7 environment with DB2 databases. I was a bit surprised at some of the results:

For the simple case (single company code and document number) all of the methods worked almost equally well. The single fully qualified SELECT worked best, while the OUTER JOIN was worst. But the worst case only added about 25% execution time. The nested SELECT was really no worse than the others
With a single company code and range of document numbers, the execution times increased, but the overall results were quite similar to the simple case with the exception that the nested SELECT added about 75% to the execution time.
With a range of company codes either with a single or range of document numbers, the results were different: the execution times for both the OUTER JOIN and fully qualified SELECT were dramatically higher (500 to 1000 times) than the other methods. This (to me at least) was the really surprising result. The following statement:

SELECT bukrs belnr gjahr  FROM bkpf  INTO TABLE bkpf_int    WHERE bukrs IN s_bukrs     AND belnr IN s_belnr      AND gjahr EQ p_gjahr.

Is far less efficient than:

SELECT t001~bukrs bkpf~belnr bkpf~gjahrFROM bkpf  INNER JOIN t001 ON  t001~bukrs EQ bkpf~bukrs  INTO TABLE bkpf_int  WHERE t001~bukrs IN s_bukrs    AND bkpf~belnr IN s_belnr    AND bkpf~gjahr EQ p_gjahr.

when a range of company codes is used. The increase in execution time for the OUTER JOIN is probably due to the fact that I could not use T001~BUKRS in the WHERE clause because of that limitation on OUTER JOINs

In the final analysis, there is no “one size fits all” answer to the question is “Which is better: a JOIN or FOR ALL ENTRIES ?” In many, if not most cases, my money is on the JOIN , but the difference is not large enough to spend much time jumping through hoops to pare off the last microsecond. In the end, if you are interested in the differences for your particular case, then you must code different SELECTs to find which is best. But then you also have to bear in mind that the same SELECT may behave differently based on the makeup of the WHERE clause.

There are other considerations that come into play as well:

INNER JOINs only look at the intersection of the results that meet the WHERE clause.
FOR ALL ENTRIES eliminates duplicates from the results.
I find JOINs to be more time consuming to code. (I can never find the “~” key.)
When using FOR ALL ENTRIES you generally end up with at least two internal tables. This may or may not be a good thing.
The example I have shown uses the full primary key. Some preliminary testing I have done comparing JOINs with FOR ALL ENTRIES show that FOR ALL ENTRIES can give better performance in that case.

One final thing to note: in the above program, the one SELECT that consistently underperformed was the one that did not use the index effectively. And that is the real point here. All of the techniques that I have shown here work reasonably effectively. The most important thing to remember is to use an index.

JOINS vs. FOR ALL ENTRIES - Which Performs Better?

Get Started with the ABAP Development Tools for SAP NetWeaver

Become an ABAP in Eclipse Feature Explorer and earn the Explorer Badge

Six kinds of debugging tips to find the source code where the message is raised