ABAP for newbies – Importance of BINARY SEARCH
When you start to code ABAP you are probably not paying much atention to the table typekind you use (STANDARD, SORTED, HASHED), and you usually forgot to use the statement addition BINARY SEARCH.
Most probably you know that the use of this addition improves performance, but when you’re starting you may be not aware of how much it does. When using LOOP WHERE the BINARY SEARCH is performed under the hood, so for newbies it can become even more complex to understand when and why ABAP is doing this search optimal for you.
I coded a little example testing 4 cases:
- LOOP WHERE over a STANDARD TABLE (Sorted with a sort statement)
- LOOP WHERE over a SORTED TABLE
- READ TABLE BINARY SEARCH + LOOP FROM INDEX over a STANDARD TABLE (Sorted with a sort statement)
- READ TABLE BINARY SEARCH + LOOP FROM INDEX over a SORTED TABLE
I prepared the variables below for my program:
TYPES: BEGIN OF lt_head, docnr TYPE n LENGTH 10, gjahr TYPE gjahr, END OF lt_head, BEGIN OF lt_position, docnr TYPE n LENGTH 10, gjahr TYPE gjahr, posnr TYPE n LENGTH 3, END OF lt_position. DATA: li_head TYPE STANDARD TABLE OF lt_head WITH HEADER LINE, li_positions TYPE STANDARD TABLE OF lt_position WITH HEADER LINE, li_positions_sorted TYPE SORTED TABLE OF lt_position WITH NON-UNIQUE KEY docnr gjahr posnr WITH HEADER LINE.
LI_HEAD will contain 5000 entries and each of this entries will contain 5 entries in the LI_POSITIONS and LI_POSITIONS_SORTED tables, the program is going to perform just the LOOPs, no code inside, as show below (You can ignore the macros i used to get runtime measurement):
START_MEASUREMENT 'SORT'. SORT li_positions BY docnr gjahr posnr. END_MEASUREMENT 'SORT'. START_MEASUREMENT 'LOOP WHERE'. LOOP AT li_head. LOOP AT li_positions WHERE docnr = li_head-docnr AND gjahr = li_head-gjahr. ENDLOOP. ENDLOOP. END_MEASUREMENT 'LOOP WHERE'. START_MEASUREMENT 'LOOP WHERE SORTED TABLE'. LOOP AT li_head. LOOP AT li_positions_sorted WHERE docnr = li_head-docnr AND gjahr = li_head-gjahr. ENDLOOP. ENDLOOP. END_MEASUREMENT 'LOOP WHERE SORTED TABLE'. START_MEASUREMENT 'BINARY SEARCH'. LOOP AT li_head. READ TABLE li_positions WITH KEY docnr = li_head-docnr gjahr = li_head-gjahr BINARY SEARCH. IF sy-subrc = 0. LOOP AT li_positions FROM sy-tabix. IF li_positions-docnr <> li_head-docnr OR li_positions-gjahr <> li_head-gjahr. EXIT. ENDIF. ENDLOOP. ENDIF. ENDLOOP. END_MEASUREMENT 'BINARY SEARCH'. START_MEASUREMENT 'BINARY SEARCH SORTED TABLE'. LOOP AT li_head. READ TABLE li_positions_sorted WITH KEY docnr = li_head-docnr gjahr = li_head-gjahr BINARY SEARCH. IF sy-subrc = 0. LOOP AT li_positions_sorted FROM sy-tabix. IF li_positions_sorted-docnr <> li_head-docnr OR li_positions_sorted-gjahr <> li_head-gjahr. EXIT. ENDIF. ENDLOOP. ENDIF. ENDLOOP. END_MEASUREMENT 'BINARY SEARCH SORTED TABLE'.
The execution of the program gives this results:
As expected LOOP WHERE is giving the worst performance by far, the reason why this LOOP is performing so bad is ABAP have not enough information to perform the BINARY SEARCH optimization (I don’t know about ABAP internals, but i’m pretty sure it must do this kind of optimization).
The difference between first and second LOOP, that helps ABAP optimize the search, is the table typekind that lets it know that the internal table is sorted, as important as know the table is sorted is to know that is sorted by the proper fields, and ABAP knows this because we specified the key in the table definition, as you can see in the code my LOOP where clause is using the fields of the key. You must note that, as in the example, using full key is not required to get the optimization.
So even having the standard table sorted by the same fields ABAP cannot be sure of that because our table is not defined that way, this forces ABAP to do a full search, wherever you find a LOOP is having poor performance you can check that you’re table is defined as sorted and that you specified the right key for the table that allows this optimization.
I also tested to do a manual BINARY SEARCH and start looping from that position until i found a different combination for my key fields, this is a solution i usually used when i didn’t understand how ABAP works, i surprisingly found that it performs slightly better, may be useful if you need to squeeze until the last bit of performance.
You can find full code at this github repository: https://github.com/jordirosa/ABAP4Newbies