The War on BINARY SEARCH
My first ABAP program was written in SAP R/3 4.7 back in 2005. It wasn’t long before I discovered BINARY SEARCH addition to the READ command, and thought it was the bee’s knees. Suddenly my programs ran so much faster and I was getting pat on the back from senior management.
Fast forward to 2020 and I hate you, BINARY SEARCH, I hate you very, very much. Why such drastic change?
Side Effects Include Death
If you’ve worked with SAP for some time, you must know about the backwards compatibility. It means that when SAP comes up with new ABAP syntax or features, the old commands do not change and continue to run as they were. This is great for the customers because they don’t end up with syntax errors in their custom code after an ABAP upgrade. But backwards compatibility has a very dangerous side effect: it does not motivate the developers to learn anything new. Indeed, in the on-premise SAP world, one can still party like it’s 1999. Or more like 1972.
The case of BINARY SEARCH is especially egregious. Table types other than STANDARD have already existed for more than a decade. HASHED table type works perfectly with a unique key. But even if that is not an option, SORTED table type uses the binary search algorithm and can have a non-unique key. It’s practically the same thing but better!
But to this day, BINARY SEARCH continues to piggyback on its old fame. I still see it mentioned in some internal guidelines (or rather “misguidelines”?) and random general performance improvement suggestions, as if it’s the best thing since sliced bread. It even still rears its head in the SCN blogs.
It’s Not About Performance
The subject of performance with different internal table types has already been covered in the SCN blogs, books, and ABAP documentation. In my own simple test (fill different tables with lots of data, then read it in a LOOP), I got a very predictable result. HASHED table was the fastest by a mile, and SORTED table, on average, performed just as fast as BINARY SEARCH. But while working on that simple program, I realized that it’s not even about the performance.
In this program, I was planning to create a routine for each table type and call them thusly:
GET RUN TIME FIELD t1. PERFORM sorted_table. GET RUN TIME FIELD t2.
Inside these routines, I’d call other routines to select data from database and then read it:
FORM sorted_table. DATA: bookings_sorted TYPE t_sorted_table. PERFORM get_bookings CHANGING bookings_sorted. PERFORM read_bookings USING bookings_sorted. ENDFORM.
(I went deliberately with a procedural program here to make the point clear to every developer. But the same issue would’ve been obvious if I used the class methods instead.)
For standard, hashed, and sorted tables this worked fine. But when the time came to type in the code for BINARY SEARCH, my brilliant copy-paste design failed for obvious reasons. BINARY SEARCH requires a special READ command syntax, hence read_bookings routine with a plain READ could not be used for it. Either I had to put additional code in the routine and add a flag to tell it when to use what or I had to write special code outside of routine just for BINARY SEARCH.
Less flexible and reusable code is always kind of a downer.
What You See Is Not What You Get
In addition to “dirtying of the code”, BINARY SEARCH has another problem: sometimes it works, sometimes it doesn’t, and you might not even know it. (Horst Keller correctly referred to it as to “error-prone”.) The internal table must be first sorted in certain way for BINARY SEARCH to produce the expected results. What happens if it’s not sorted properly? As I learned back in 2006, in that case the result is like a lottery. Depending on specific data, it might work correctly sometimes but then it could miss a record that clearly exists in the table.
One might say “what’s the big deal, just remember to sort” but that’s exactly the problem. You’re relying on generations of future developers who will maintain the program to remember that it requires “special treatment”. What could possibly go wrong.
Let Database Do Its Job
BINARY SEARCH signals not only the program’s age or lack of table type awareness by the previous developers. Usually, it comes hand in hand with bad program design and underused database capabilities.
Out of curiosity, I ran a code scan for BINARY SEARCH clause in a random set of the legacy programs. In vast majority of the cases, BINARY SEARCH could have been replaced not just by HASHED or SORTED table but by a SELECT statement. A typical example here would be reading MARA and MAKT tables separately, then merging two internal tables. That’s just straight-up SELECT… JOIN. And the scenarios where quantity or amount, for example, is accumulated by material or customer, SELECT… SUM would do just fine.
“Code pushdown” might be a novel HANA-related concept but allowing database to do its job has always been a good practice. It’s just we didn’t always practice it. And leaning on a crutch of BINARY SEARCH did not help. It’s time to let go.
Does all this mean that BINARY SEARCH has no use whatsoever and must be purged from ABAP syntax? Probably not. But the valid use cases for it are so few and far in between that, as a tool, BINARY SEARCH deserves to be put out of sight on a very high shelf of our ABAP garage. Or, preferably, burned with fire. 🙂
Nice summary. But is a known thing and we are beating around the bush. But can you comment on performance of the columnar tables vs row tables in select on big tables where we select not some but a group of fields. Request you to please share your opinion on those if you have any thoughts.
Thanks for the comment. "Beating around the bush" means suggesting something but not talking about it directly (hence the "around" part). I feel this blog is anything but. 🙂
I'm not sure how the follow-up question is related to the subject of this blog, this is not about the DB performance. We can use the development tools available in any SAP system (also for many years already) to evaluate any code or scenario, if necessary.
Yes you are correct. I guess the right word would be hackneyed.
"existed for more than a decade": sorted and hashed tables were added in release 4.0 (in 1998 if I'm not wrong, 22 years ago! 🙂 )
Yikes! Yet no one even mentioned those to me when I was still learning in 2005. Just shows how deep-seated this issue is...
This is an important point that you mention. There are very nice colleagues who like to share their knowledge. However, the knowledge isn't always up to date and so you don't learn modern solutions but how it was done in the 80s/90s.
I had some bad experiences with BINARY SEARCH, which took a lot of my life time and nerves 🙁 The solution for me was almost always to choose the right table type and to think about the overall design and structure. In the meantime I try to work a lot with table expressions if possible. No READ TABLE and no BINARY SEARCH either.
Thanks for chiming in!
I haven't used it for many years and every time I see it in a program I'm working on, I make an effort to remove it. In all this time, I'm yet to see a single case where BINARY SEARCH use is justified.
Thank you for the post. I have always avoided BINARY SEARCH and I have never understood why developers would use it. I rather assumed I was missing some important concept.
The other fun thing is that when a read on a STANDARD table fails the SY-SUBRC is 4. When it fails due to a BINARY SEARCH it is 8.
That is a trick for young players. I think that is the same for SORTED tables.
The most horrible thing is this. We have had sorted tables and the like for 22 years. I am fairly sure I heard about them in my first ever ABAP training course. I started programming (in ABAP) in 1999.
To this very day I still come across code, often freshly written code, where a full scan is done on a STANDARD internal table to get one record. In a loop. Often a nested loop.
That's because for many developers and, most importantly, many organizations / managers, "buT IT wOrks" is still the main argument. Next thing you know, we have to open a second instance of "ABAP gore" on SCN because the original one already takes full minute to load with 300 comments.
I agree. A standard table with BINARY SEARCH is rarely (if ever) the best solution. The existence of secondary keys reduces the need for standard tables even further.
For instance, a use case I sometimes see for BINARY SEARCH is that the internal table needs to be read using different keys, so the table is defined as a standard table, sorted one way, read using BINARY SEARCH, then sorted a different way somewhere else in the code and read again using BINARY SEARCH. Often, a better alternative is to define the table as SORTED or HASHED and then to define secondary SORTED and/or HASHED keys which can be used when reading or looping at the table.
This can also be helpful if the same internal table needs to be read randomly for single records (in which case a HASHED key is best) and also looped at with a WHERE clause (in which case a SORTED key is best).
Thanks for the comment! There are indeed some cases when we need to get creative. For example, I worked on a program where I actually moved data between STANDARD and HASHED table (don't remember why exactly I needed standard type) and even that was still quite efficient.
Secondary indexes on internal tables are definitely underrated. Matthew Billingham posted a good blog on this few years ago.
BINARY SEARCH BITES BACK!
Thanks to mention my post 🙂
I'm not so against the use of BINARY SEARCH, not even as statement, also when you use a sorted table ABAP is doing binary search for you. The previous post https://blogs.sap.com/2020/02/06/abap-for-newbies-importance-of-binary-search/ mostly explain how to get the benefit of binary search in a loop on a sorted table. It's aimed for newbies doing full scan loops. But i try to explain how things work so people with very few experience can understand why it works.
I will write one more post, everyone is welcome and don't hesitate to comment if you disagree with the content of my blog, it may help people with would not agree with me if they know your reasons.
an entertaining summary, thank you!
If only a few use cases for BINARY SEARCH are valid, then IMO we should create a catalog/pattern and point ot alternatives for all other cases.
You noted BINARY SEARCH
But you also stated with the first benefit: it is often the only quick fix available for performance enhancement.
I will offer a second use case: we know that filling a standard table with APPEND is faster than filling any other table type. If we then decide we have enough reads to justify the costs of a SORT operation, BINARY SEARCH might be a good solution. Even then, I would recommend to use a sort-on-write copy of table to avoid BINARY SEARCH if you can afford the memory duplication.
Disclaimer: Statements on performance are often meaningless without profiling data.
Thanks for the comment!
I agree with you completely. That's exactly why it would be difficult to single out a specific pattern. I believe it would have to be case-by-case and am not sure it's really worth any further discussion.
And in many cases the data volumes are not that large to even make it worth considering. I had to scroll through more than a year worth of tweets but finding this gem was worth it. 🙂
Consider that CL_SALV_TABLE and CL_GUI_ALV_GRID only allow STANDARD tables.
That's IMHO one of the main reasons why standard tables are so common.
There's nothing wrong with standard tables IMHO
(if you don't search millions of records sequencially)
I've eliminated the two statements READ TABLE and APPEND completely from my vocabular.
I think, I'm happier now 😉
The inline read is not always an alternative due to perfromance issues if entry does not exist. So READ TABLE should still be recallable... 🙂
Never heard of this. Why READ TABLE should be faster? I'm pretty sure, that the same kernel routines are triggered.
No, not READ TABLE is faster, but if entry does not exist the exception handling takes pretty much time.
Again: it is? Why? Do you have meassures?
And if yes: does it really matter? Are you reading a table thousents of times without knowing, whether the entry exists? Maybe LINE_EXISTS does help here.
I know, I'm annoying, but I just want to be sure ...
Why: because of the exception handling which is time consuming
Measurements: yes, personal checks
Does it matter: it depends… ? There are several possibilities where you might only insert data into a table if the data does not exist and this access must be done multiple times.
LINE_EXISTS: Yes, this really might be an alternative. I first thought that this was totally nonsense because you’d do the same read access twice. But this seems to work out quite fast.
Thank you, that's the answer I needed 🙂
What do you use instead of READ TABLE?
result = table[ key = value ].
Great post Jelena Perfiljeva , I will be honest here i can see two big clients for whom i am working they still use the BINARY SEARCH guideline. One of the reason why our SAP system needs high maintenance support is because of the way the custom developments are done. In one of the comments it is rightly mentioned quick fix and people follow it without even giving it a second thought.
I think the over all problem is the mindset to adapt to the new changes. We want client to adapt to new technology but when it comes to us we are reluctant to use, try out new things, we just want the work to be done.
Anyways as always great post
Thank you for the comment!
"we just want the work to be done" - yes, this is a very familiar pitfall. That's how many people end up repeating the same "that's how we've been doing it" stuff over and over. And under pressure to meet tight deadlines or billable hour quotas, it's all too tempting to cut corners.
Most companies still don't recognize the value of developer learning, in my opinion. It is either left up to the workers own initiative or, if any formal learning exists, it's insufficient or irrelevant.
Thanks for sharing the blog. Good thoughts... Maybe one reason for frequent use of BINARY SEARCH is that std SAP code has abundant usage of STANDARD internal tables and BINARY search (even the most recent one from S/4HANA)... (which is also learning source for a lot of ABAPers)... but then I still agree with you that it should not be a reason for ABAPers to not to learn new stuff... On the positive side there are SAP programs that use all the new techniques and that should be a motivation for people to use the new stuff...
Very nice read, thanks!
I recently (as in the last weeks and yesterday) had reason to be thankful for BINARY SEARCH as it helped to quickly improve severe performance issues we had with some rather old but heavily used programs. One of these occurrences is detailed in my recent question here, the other needed to be quickly turned around yesterday, so there's no write-up. A program selected tons of data into internal tables and even though they were properly sorted didn't yet - for whatever reason - have the BINARY SEARCH addition for the READ statements. For a particular selection it ran for 9 hours or so and utilizing SAT it was quick to pinpoint the problematic statements. Simply adding BINARY SEARCH to the two READs brought the runtime down to about an hour. And yes, an underlying problem is the current selection casting way too wide a net, so the next step will be to convince the users to restrict the selection at least somewhat (but that's a task for colleagues of mine!).
After reading the linked question, I think you might want to report this to SAP in some way. I was going to ask if there was a reason not to use a SORTED table but the question explains it: the problem was in SAP code and was solved by slapping BINARY SEARCH in an enhancement. That's one of the very few valid cases that I mentioned exist but hardly an endorsement, isn't it?
P.S. It's amazing how many times we complain that ABAPers don't want to learn anything new, but then someone dares to point out an outdated technique and 10 people jump in to defend it. 🙂
Often standard SAP data collection function modules are being used which do return data by a TABLES parameter. These SAP function modules expect a standard table. However it is allowed to use tables with a secondary key.