Technical Articles
ABAP for newbies – Importance of BINARY SEARCH
Hi,
When you start to code ABAP you are probably not paying much atention to the table typekind you use (STANDARD, SORTED, HASHED), and you usually forgot to use the statement addition BINARY SEARCH.
Most probably you know that the use of this addition improves performance, but when you’re starting you may be not aware of how much it does. When using LOOP WHERE the BINARY SEARCH is performed under the hood, so for newbies it can become even more complex to understand when and why ABAP is doing this search optimal for you.
I coded a little example testing 4 cases:
- LOOP WHERE over a STANDARD TABLE (Sorted with a sort statement)
- LOOP WHERE over a SORTED TABLE
- READ TABLE BINARY SEARCH + LOOP FROM INDEX over a STANDARD TABLE (Sorted with a sort statement)
- READ TABLE BINARY SEARCH + LOOP FROM INDEX over a SORTED TABLE
I prepared the variables below for my program:
TYPES: BEGIN OF lt_head,
docnr TYPE n LENGTH 10,
gjahr TYPE gjahr,
END OF lt_head,
BEGIN OF lt_position,
docnr TYPE n LENGTH 10,
gjahr TYPE gjahr,
posnr TYPE n LENGTH 3,
END OF lt_position.
DATA: li_head TYPE STANDARD TABLE OF lt_head
WITH HEADER LINE,
li_positions TYPE STANDARD TABLE OF lt_position
WITH HEADER LINE,
li_positions_sorted TYPE SORTED TABLE OF lt_position
WITH NON-UNIQUE KEY docnr gjahr posnr
WITH HEADER LINE.
LI_HEAD will contain 5000 entries and each of this entries will contain 5 entries in the LI_POSITIONS and LI_POSITIONS_SORTED tables, the program is going to perform just the LOOPs, no code inside, as show below (You can ignore the macros i used to get runtime measurement):
START_MEASUREMENT 'SORT'.
SORT li_positions BY docnr gjahr posnr.
END_MEASUREMENT 'SORT'.
START_MEASUREMENT 'LOOP WHERE'.
LOOP AT li_head.
LOOP AT li_positions WHERE docnr = li_head-docnr
AND gjahr = li_head-gjahr.
ENDLOOP.
ENDLOOP.
END_MEASUREMENT 'LOOP WHERE'.
START_MEASUREMENT 'LOOP WHERE SORTED TABLE'.
LOOP AT li_head.
LOOP AT li_positions_sorted WHERE docnr = li_head-docnr
AND gjahr = li_head-gjahr.
ENDLOOP.
ENDLOOP.
END_MEASUREMENT 'LOOP WHERE SORTED TABLE'.
START_MEASUREMENT 'BINARY SEARCH'.
LOOP AT li_head.
READ TABLE li_positions WITH KEY docnr = li_head-docnr
gjahr = li_head-gjahr
BINARY SEARCH.
IF sy-subrc = 0.
LOOP AT li_positions FROM sy-tabix.
IF li_positions-docnr <> li_head-docnr OR
li_positions-gjahr <> li_head-gjahr.
EXIT.
ENDIF.
ENDLOOP.
ENDIF.
ENDLOOP.
END_MEASUREMENT 'BINARY SEARCH'.
START_MEASUREMENT 'BINARY SEARCH SORTED TABLE'.
LOOP AT li_head.
READ TABLE li_positions_sorted WITH KEY docnr = li_head-docnr
gjahr = li_head-gjahr
BINARY SEARCH.
IF sy-subrc = 0.
LOOP AT li_positions_sorted FROM sy-tabix.
IF li_positions_sorted-docnr <> li_head-docnr OR
li_positions_sorted-gjahr <> li_head-gjahr.
EXIT.
ENDIF.
ENDLOOP.
ENDIF.
ENDLOOP.
END_MEASUREMENT 'BINARY SEARCH SORTED TABLE'.
The execution of the program gives this results:
As expected LOOP WHERE is giving the worst performance by far, the reason why this LOOP is performing so bad is ABAP have not enough information to perform the BINARY SEARCH optimization (I don’t know about ABAP internals, but i’m pretty sure it must do this kind of optimization).
The difference between first and second LOOP, that helps ABAP optimize the search, is the table typekind that lets it know that the internal table is sorted, as important as know the table is sorted is to know that is sorted by the proper fields, and ABAP knows this because we specified the key in the table definition, as you can see in the code my LOOP where clause is using the fields of the key. You must note that, as in the example, using full key is not required to get the optimization.
So even having the standard table sorted by the same fields ABAP cannot be sure of that because our table is not defined that way, this forces ABAP to do a full search, wherever you find a LOOP is having poor performance you can check that you’re table is defined as sorted and that you specified the right key for the table that allows this optimization.
I also tested to do a manual BINARY SEARCH and start looping from that position until i found a different combination for my key fields, this is a solution i usually used when i didn’t understand how ABAP works, i surprisingly found that it performs slightly better, may be useful if you need to squeeze until the last bit of performance.
You can find full code at this github repository: https://github.com/jordirosa/ABAP4Newbies
Regards.
You might be interested in the following performance test where Binary Search is compared against alternative approaches in nested loops: https://keremkoseoglu.com/2015/07/09/abap-nested-loop-performance/ .
Long story short: Data types within internal tables seem to affect the performance outcome. But in average, Binary Search seems to be the slowest method (compared to Hashed Table, Paralel Cursor and Sorted Table).
Thanks for sharing the article, is really interesting, i will take a look to more posts in your web.
Maybe you can add a README to your repository to explain that people may install it with abapGit (newbies are probably not aware of its existence).
Coudn’t agree more, i will update the repository.
Thank you 🙂
When it comes to table access, you always have to know the data and the number of accesses, in order to make a well informed decision about what type or better what search algorithm to choose. And for each search algorithm, in almost all cases, there is also a ‘sorting’ algorithm that has to be performed before hand.
Examples:
=> dont bother with sorting and using binary search or building buckets and using a hash table.
Because this will allways take at least O(n+1) time, where n is the number of items in your table.
Where as the sequential search will only in the worst case give you a performance of O(n) time.
=> In this case, a parallel cursor will be the fastest, where both tables need to be sorted first
If n and m are the sizes of the two tables, then sorting the tables will take O(nlogn) and O(mlogm) time in the worst cases, but only O(n+m) time to do the nested loop
The only problem I have with the parallel cursor, is that you have to implement and test it yourself, because it does make a difference on the implementation if you have a 1:1 or a 1:n match, whereas a sorted or hash table search does that for you already and looks a little cleaner
I very seldom would do a binary search on a standard table that has been sorted seperatly, because you and other developers always need to be aware of this, and also adding items to your table always requires a resort
Update:
Jordi Rosa in those more "complex" examples, you could also try out having not only primary keys but also multiple secondary keys and whether the table is sorted or not, and also see what this does in terms of performance to not only search but also insert and delete functions.
And keep a binary search statement solely to a table type that is not sorting by it self, but has to be sorted by coding. Using the binary search statement on a sorted table type with non- or unique keys can lead to wrong results, if you dont access it by its keys.
Altough the binary search statement on a sorted table is pretty much the same as a search by primary or secondary key on a standard or sorted table, there still can be some differences in performance, especially insert and delete, due to overhead in indizes but also how SAP has internally programmed those functions.
Good explanation, maybe i'll write another post with some more complex examples so i can get more feedback.
Thanks for sharing your knowledge.
Bingo. SORTED table type is already using binary search algorithm. I've ran this through different scenarios before and, like Kerem Koseoglu , arrived at a conclusion that on average, SORTED table performance is not worse than when using BINARY SEARCH. Interestingly, I ran the same program probably 30 times in a row and the result fluctuated by a single-digit percent between SORTED and BINARY SEARCH option. Go figure.
But the main thing is: it's not just about the performance. BINARY SEARCH can offer a marginal improvement in runtime while it's effect on the code maintenance and "cleanliness" is far worse. If you forget to sort (or re-sort) the table, you could simply get a wrong result.
For this reason, when working on the 2nd edition on the ABAP Introduction book, we have revised the section on BINARY SEARCH to advise against using it. We still had to explain it, but made it clear this is not a recommended solution anymore for multiple reasons.
Thank you!
I was planning to write about the problem you mention, i wanted to show the debugger option that checks that a table is sorted before BINARY SEARCH, i know it's not perfect because if your table is sorted just by chance the check will be ok, but this option can help reduce mistakes.