Skip to Content

In earlier blogs, I looked at various performance tuning techniques and tried to identify some that are more important than others. In this blog, I want to look at different ways to construct a simple SELECT statement.

A very common question that is asked in the ABAP forum is “Which is better: a JOIN or FOR ALL ENTRIES ?” I’ve written a program that compares six different ways of constructing a SELECT statement: a simple, fully qualified SELECT ; a nested SELECT ; a SELECT using FOR ALL ENTRIES ; a SELECT using an INNER JOIN ; a SELECT using an OUTER JOIN ; and a SELECT using a sub-query. All of these SELECT s are fully qualified in the sense that they use all fields of the primary key. For comparison, I’ve also added a SELECT that doesn’t fully use the primary key.

This task was made more difficult by the fact that it’s not really easy to compare a JOIN with a sub-query. A JOIN assumes that you want the data from more than one table. A sub-query assumes that you need data only from the main table. So the SELECT statements that I have constructed are quite simple and in some cases, not practical. They are just for comparison purposes.

I’ve used the GET RUN TIME statement for comparison rather than the EXPLAIN function of transaction ST05 because it’s difficult to compare multiple SELECT s with single SELECT s using this function. GET RUN TIME is not perfect either, but if you do multiple comparisons, particularly in a system with little activity, the results should be OK. I’ve put all of the SELECTs used in comparisons within loops. You can adjust the number of loop passes on the selection screen.

In any event, here is the program:

The program has two SELECT-OPTIONS and one PARAMETER for selecting data: Company code, document number and fiscal year. I ran it four different ways: with a single company code and document number, with a single company code and a range of document numbers, with a range of company codes and a single document number and with ranges of both company codes and document numbers.

I ran the program a number of times in a 4.7 environment with DB2 databases. I was a bit surprised at some of the results:

  • For the simple case (single company code and document number) all of the methods worked almost equally well. The single fully qualified SELECT worked best, while the OUTER JOIN was worst. But the worst case only added about 25% execution time. The nested SELECT was really no worse than the others
  • With a single company code and range of document numbers, the execution times increased, but the overall results were quite similar to the simple case with the exception that the nested SELECT added about 75% to the execution time.
  • With a range of company codes either with a single or range of document numbers, the results were different: the execution times for both the OUTER JOIN and fully qualified SELECT were dramatically higher (500 to 1000 times) than the other methods. This (to me at least) was the really surprising result. The following statement:

Is far less efficient than:

when a range of company codes is used. The increase in execution time for the OUTER JOIN is probably due to the fact that I could not use T001~BUKRS in the WHERE clause because of that limitation on OUTER JOINs

In the final analysis, there is no “one size fits all” answer to the question is “Which is better: a JOIN or FOR ALL ENTRIES ?” In many, if not most cases, my money is on the JOIN , but the difference is not large enough to spend much time jumping through hoops to pare off the last microsecond. In the end, if you are interested in the differences for your particular case, then you must code different SELECTs to find which is best. But then you also have to bear in mind that the same SELECT may behave differently based on the makeup of the WHERE clause.

There are other considerations that come into play as well:

  • INNER JOINs only look at the intersection of the results that meet the WHERE clause.
  • FOR ALL ENTRIES eliminates duplicates from the results.
  • I find JOINs to be more time consuming to code. (I can never find the “~” key.)
  • When using FOR ALL ENTRIES you generally end up with at least two internal tables. This may or may not be a good thing.
  • The example I have shown uses the full primary key. Some preliminary testing I have done comparing JOINs with FOR ALL ENTRIES show that FOR ALL ENTRIES can give better performance in that case.

One final thing to note: in the above program, the one SELECT that consistently underperformed was the one that did not use the index effectively. And that is the real point here. All of the techniques that I have shown here work reasonably effectively. The most important thing to remember is to use an index.

To report this post you need to login first.

19 Comments

You must be Logged on to comment or reply to a post.

  1. Suresh Datti
    I have followed your other blogs/responses that foucs on performance issues too.
    One small correction to the final analysis though..
    “FOR ALL ENTRIES eliminates duplicates from the results. “
    Not always.. If you include all the key fields in your SELECT clause or use a SELECT *, the duplicates do get picked with the FOR ALL ENTRIES option.
    ~Suresh

    (0) 
      1. Jason Scott
        I think much more reailistic results to compare the differences can be gained by selecting much larger sets of data. Try queries that search thru tables containing millions of records…
        (0) 
        1. Rob Burbank Post author
          Yes, I agree. I wanted to compare as many different methods as possible – including a sub-query. I tried but wan’t able to come up with a combination of tables that were both large and allowed me to do everything I wanted. I’m sure I missed something, but there you are.

          As I said in response to another question, I did some testing on joins using secondary indices that showed somewhat different results. For this testing, I did use larger tables (EKKO, EKPO and some others). In another blog: Performance – what will kill you and what will leave you with only a flesh wound (Performance – what will kill you and what will leave you with only a flesh wound)
          I also used larger tables.

          Thanks for your comments.

          Rob

          (0) 
  2. Peter Inotai
    Hi Rob,

    Thanks for this interesting weblog!

    I believe the result also depends on the DB tuning. There are several FOR ALL ENTRIES relevant profile parameters.
    It’s explained in the following OSS notes:
    Note 48230 – Parameters for the SELECT … FOR ALL ENTRIES statement
    Note 652634 – FOR ALL ENTRIES performance with Microsoft SQL Server
    Note 634263 – Selects with FOR ALL ENTRIES as of kernel 6.10
    It’s worth to check these parameters, before making any decision, which way to choose.

    Best regards,
    Peter

    (0) 
    1. Rob Burbank Post author
      Thanks for the reply – yes, it certainly does depend on database tuning. I guess, I didn’t state it overtly, but this is just a tool to help with analysis. I’m not a DBA and don’t the details of how the database is tuned, all I can do is write a program that shows up the differences. (Or ask a DBA, but what’s the fun in that?)

      Rob

      (0) 
  3. Lars Breddemann
    How have the join conditions been supported by indexes on db-level in your tests?
    DBs heavily rely on additional structures that enable the efficient handling of Join-Selects.
    I really would like to see how the Join-Performance-Comparisation looks like if the database has the right indexes in place.

    KR Lars

    (0) 
    1. Rob Burbank Post author
      I’m not entirely sure I understand your question. But I’ll try to answer. If I’m off base, let me know.

      I only looked in this blog at the primary index. As I did some testing while doing the research, I did some tests using secondary indices and found that FOR ALL ENTRIES was somewhat faster than a JOIN.

      If time permits, I’ll try to look at this in another blog.

      Rob

      (0) 
  4. Joe Reddy
    Hi Rob,
        I understand the profound performance difference it makes when we use Joins and For All Entries.
        But again the Performance depends on various other System Factors also.

    Cheers,
    Joseph.

    (0) 
    1. Rob Burbank Post author
      Yes, it does. But this was really just an attempt to show one way to answer the question for yourself programmatically without knowing the system factors.

      Rob

      (0) 
  5. Kjetil Kilhavn
    I see that your program first executes the selection, and then measures the execution time of performing it again (in the loop).

    I re-arranged your PERFORMs, turning the list upside down. Result: fully qualified select was fastest every time when I selected data for two company codes (all document numbers)

    I have possibly misunderstood how you gave your inputs, but it beats me how a join with T001 can be faster than a direct select on BKPF, unless you specify a lot of invalid company codes in the selection criteria.

    (0) 
    1. Kjetil Kilhavn
      Aha! I tried again, but instead of specifying the two company codes (0010 and 0040) as two individual entries, I specified it as a range from 0010 to 0040.

      Performance got considerably worse for ALL routines except the partially qualified one. For the partially qualified routine the results were virtually unchanged, for the others the runtime was increased to the same level as for the partially qualified routine.

      So while the first run gave the following average results for the five iterations:
      Partially qualified: 3,229 seconds
      Sub-query: 0,446 seconds
      Outer join: 0,582 seconds
      Inner join: 0,444 seconds
      FOR ALL ENTRIES: 0,466 seconds
      Nested: 0,562 seconds
      Fully qualified: 0,436 seconds

      The second run gave the following average results:
      Partially qualified: 3,214 seconds
      Sub-query: 3,252 seconds
      Outer join: 4,159 seconds
      Inner join: 3,238 seconds
      FOR ALL ENTRIES: 2,721 seconds
      Nested: 3,227 seconds
      Fully qualified: 3,173 seconds

      The second run includes four company codes instead of two, but it surprised me that the performance where now almost the same in all cases. So I tried specifying a range of 0010 to 0040 and excluding the two company codes 0020 and 0030.
      Partially qualified: 3,225 seconds
      Sub-query: 0,572 seconds
      Outer join: 2,818 seconds
      Inner join: 0,564 seconds
      FOR ALL ENTRIES: 0,467 seconds
      Nested: 0,562 seconds
      Fully qualified: 2,617 seconds

      Finally I tried specifying the four company codes 0010, 0020, 0030, and 0040 as four individual entries. This should compare to the second case (same data at least).
      Partially qualified: 3,218 seconds
      Sub-query: 2,539 seconds
      Outer join: 3,180 seconds
      Inner join: 3,273 seconds
      FOR ALL ENTRIES: 2,712 seconds
      Nested: 3,223 seconds
      Fully qualified: 2,507 seconds

      Why is the fully qualified select be four times as fast when specifying the company codes individually rather than as a range?

      What to make of it? I am not really sure… except to support the statement that there is no step-by-step recipe you can follow to improve performance.

      If I am to draw one (small) conclusion it is that fully qualified selects are much less robust to different specifications of ranges than a join or sub-query or FOR ALL ENTRIES.

      (0) 
    2. Rob Burbank Post author
      The direct select on BKPF was first without any company codes.

      So I used a technique that I showed in an earlier BLOG:

      Using an Index When You Don’t Have all of the Fields

      If you don’t know one of the leftmost key fields, it turns out to be faster to use all possible entries in the SELECT rather than just leave it out of the WHERE.

      As for the order of doing the SELECTS, that’s why I put them in a DO that can be executed multiple times. That way each SELECT both comes before and after every other one.

      Thanks for taking the time to comment.

      Rob

      (0) 
  6. Siegfried Boes
    Hi Rob,

    I have played a bit with your program just now, and I must say that I find the set-up a bit special. The number of found records should be much larger and two case should be handled.

    Many joins work in a way that the conditions on table A give – lets say – 1000 records and the conditions on table B 2000 records, but the inner join is fulfilled on by an intersection of the two set, containing 50 records. In this cases it is quite obvious that a join is much faster than a FOR ALL ENTRIES.

    FOR ALL ENTRIES makes sense if the first select gives 1000 records and every further selects adds information to the 1000 records. FOR ALL ENTRIES is perfect if the SELECTS are not close together because there is processing in between. If the SELECTS come close together then a JOIN would also be an option. Be aware that the join can put the information of all tables into one internal table with the results. The FOR ALL ENTRIES can not do that (not yet). There the internal table must be mixed up separately, and if no BINARY SEARCH is used then it is definitely slower.

    Siegfried

    (0) 
    1. Rob Burbank Post author
      Siegfried – I mostly agree, but the real point here is that the performance gains in either case are small. You will likely not be able to cut down the execution time in half of a SELECT using one method over another.

      I generally find FOR ALL ENTRIES to be easier to use and I mostly use that.

      But if you look at the forums (and I know you do), you’d think that the most important performance tuning technique is to use SELECT … INTO CORRESPONDING FIELDS OF… (or is it to avoid using that) and some other things that don’t much matter.

      But if you want to reduce execution times by 1/20 or 1/30, you have to look at other things.

      You might also want to look at

      http://blogs.ittoolbox.com/sap/db2/archives/for-all-entries-vs-db2-join-8912

      Rob

      (0) 
      1. Jay Dalwadi

        sir, can you tell me which is better in performance tunning? i am still confuse in for all entries or inner join because Mr. Matthew Billingham told me that For All Entries is LESS efficient than an inner join.

        (0) 

Leave a Reply