JOINS vs. FOR ALL ENTRIES – Which Performs Better?
In earlier blogs, I looked at various performance tuning techniques and tried to identify some that are more important than others. In this blog, I want to look at different ways to construct a simple SELECT statement.
A very common question that is asked in the ABAP forum is “Which is better: a JOIN or FOR ALL ENTRIES ?” I’ve written a program that compares six different ways of constructing a SELECT statement: a simple, fully qualified SELECT ; a nested SELECT ; a SELECT using FOR ALL ENTRIES ; a SELECT using an INNER JOIN ; a SELECT using an OUTER JOIN ; and a SELECT using a sub-query. All of these SELECT s are fully qualified in the sense that they use all fields of the primary key. For comparison, I’ve also added a SELECT that doesn’t fully use the primary key.
This task was made more difficult by the fact that it’s not really easy to compare a JOIN with a sub-query. A JOIN assumes that you want the data from more than one table. A sub-query assumes that you need data only from the main table. So the SELECT statements that I have constructed are quite simple and in some cases, not practical. They are just for comparison purposes.
I’ve used the GET RUN TIME statement for comparison rather than the EXPLAIN function of transaction ST05 because it’s difficult to compare multiple SELECT s with single SELECT s using this function. GET RUN TIME is not perfect either, but if you do multiple comparisons, particularly in a system with little activity, the results should be OK. I’ve put all of the SELECTs used in comparisons within loops. You can adjust the number of loop passes on the selection screen.
In any event, here is the program:
The program has two SELECT-OPTIONS and one PARAMETER for selecting data: Company code, document number and fiscal year. I ran it four different ways: with a single company code and document number, with a single company code and a range of document numbers, with a range of company codes and a single document number and with ranges of both company codes and document numbers.
I ran the program a number of times in a 4.7 environment with DB2 databases. I was a bit surprised at some of the results:
- For the simple case (single company code and document number) all of the methods worked almost equally well. The single fully qualified SELECT worked best, while the OUTER JOIN was worst. But the worst case only added about 25% execution time. The nested SELECT was really no worse than the others
- With a single company code and range of document numbers, the execution times increased, but the overall results were quite similar to the simple case with the exception that the nested SELECT added about 75% to the execution time.
- With a range of company codes either with a single or range of document numbers, the results were different: the execution times for both the OUTER JOIN and fully qualified SELECT were dramatically higher (500 to 1000 times) than the other methods. This (to me at least) was the really surprising result. The following statement:
Is far less efficient than:
when a range of company codes is used. The increase in execution time for the OUTER JOIN is probably due to the fact that I could not use T001~BUKRS in the WHERE clause because of that limitation on OUTER JOINs
In the final analysis, there is no “one size fits all” answer to the question is “Which is better: a JOIN or FOR ALL ENTRIES ?” In many, if not most cases, my money is on the JOIN , but the difference is not large enough to spend much time jumping through hoops to pare off the last microsecond. In the end, if you are interested in the differences for your particular case, then you must code different SELECTs to find which is best. But then you also have to bear in mind that the same SELECT may behave differently based on the makeup of the WHERE clause.
There are other considerations that come into play as well:
- INNER JOINs only look at the intersection of the results that meet the WHERE clause.
- FOR ALL ENTRIES eliminates duplicates from the results.
- I find JOINs to be more time consuming to code. (I can never find the “~” key.)
- When using FOR ALL ENTRIES you generally end up with at least two internal tables. This may or may not be a good thing.
- The example I have shown uses the full primary key. Some preliminary testing I have done comparing JOINs with FOR ALL ENTRIES show that FOR ALL ENTRIES can give better performance in that case.
One final thing to note: in the above program, the one SELECT that consistently underperformed was the one that did not use the index effectively. And that is the real point here. All of the techniques that I have shown here work reasonably effectively. The most important thing to remember is to use an index.