Technical Articles
ABAP for newbies – Importance of BINARY SEARCH (2)
Hi,
Thanks to the valuable comments of community i think i can improve the information of the previous post, i’m doing a bit of research, and i will add more tests for a next post. About the optimization of WHERE clause in LOOPs i found there’s a good explanation in the official documentation about the subject: https://help.sap.com/doc/abapdocu_752_index_htm/7.52/en-us/abenitab_where_optimization.htm
Meanwhile i want to write briefly about a problem mentioned on the comments of the first post https://blogs.sap.com/2020/02/06/abap-for-newbies-importance-of-binary-search/, the use of BINARY SEARCH with an unsorted table. When performing a binary search the alghorithm starts at the middle of the table and it checks if the entry at the middle of the table is smaller or bigger than the value i’m looking for, if the entry is smaller the first half of the table can be discarded, if it’s bigger the last half will be discarded, then the alghorithm is repeated with the half of the table not discarded, this procedure is done until the value i’m looking for is found. Performing this check to discard half of the data is correct only when the data is sorted.
Does not look like a big problem, we just have to use the BINARY SEARCH only on tables that were sorted. But let’s test what happens when our data it’s not sorted. I coded the next example:
TYPES: BEGIN OF lt_unsorted,
number TYPE i,
END OF lt_unsorted.
DATA: li_unsorted TYPE STANDARD TABLE OF lt_unsorted
WITH HEADER LINE.
li_unsorted-number = 1.
APPEND li_unsorted.
li_unsorted-number = 3.
APPEND li_unsorted.
li_unsorted-number = 2.
APPEND li_unsorted.
READ TABLE li_unsorted WITH KEY number = 2
BINARY SEARCH.
WRITE: / 'SY-SUBRC is:', sy-subrc.
The output for the program is:
The program compiles ok but first thing we note from the output is that when a BINARY SEARCH is done over an unsorted table your program will most probably not work unless your READ TABLE statements are really lucky. In our example when the BINARY SEARCH algorithm is applied the entry with value 2 in the number field can’t be found because the program is wrong, but you will not get any warning of this error, so if you forgot to sort the table you will have to detect it by yourself.
This is a very simple program and the solution is clear, but in big programs could be not as easy to find the problem, luckily there’s a very handy option in the debugger.
To activate it follow the next steps:
- When in the debugger go to Settings->Change Debugger Profile/Settings
- Mark the “Check Sorting Before READ BINARY SEARCH” option:
If i run my program now i will get a dump when a BINARY SEARCH over and unsorted table is performed, i will see this kind of dump:
In the dump detail you can see exactly in which line the wrong READ was detected.
An important thing you must know is that you will get this dump only when the data is not sorted, but if you use an example where your data is sorted not intentionally but by chance, you will not get the dump because the debugger option only can check if at the moment of the READ the data it’s ok, but it can’t know if the correctness of data is because of a SORT or to chance.
If you suspect your program is returning wrong data because a wrong BINARY SEARCH my recommendation is to run the program with this option activated, it’s faster than analyze the whole source code.
You can find full code at this github repository: https://github.com/jordirosa/ABAP4Newbies
The repository report program for this example is YJRS_ABAP_002.
I will write about the alternatives and their performance if you find it interesting and you want to be notified of the new content you can follow my profile.
Regards
Often, the issue of a non-sorted table is detected very late in the debugging phase, at a moment you already feel that a SORT was not done, so I feel this option is more or less useless (nowadays). That would be nice if that option could always be active, but unfortunately the SAP Library says NO (chapter Settings of "New" ABAP Debugger): "You should activate this setting only shortly before reaching the point in the source code, because there can be a significant loss in performance, depending on the table size."
So, I would say, as a rule-of-thumb, don't use standard tables with a binary search, instead, use a sorted table or a table with a secondary sorted key.
I like to read different viewpoints, i don’t feel the option so useless, i mean you will probably have to deal sometimes with code from another developers, some of them will be more experienced and some not so much.
I think every statement is good as long as the developer using it knows how to use it, i can make a mistake and forget to sort the table, but i can make a mistake and define a wrong key for my sorted table, the impact in performance will be huge, even so i would agree with your rule-of-thumb because a huge impact in performance is preferable than a wrong result, well used both will give a good result, it’s a misuse of them what causes problems.
Thanks for the info about the setting, i didn’t know the impact in performance is big enough to have a warning in documentation, i never had any issue with this but it’s good to know if the program is near the timeout.
There are times when I use binary search. For example, if the data is to be bunged into cl_salv_table, it has to be a standard table. So your choice is either to have two tables, one with an appropriate key and a standard table, or one standard table.
I never tried, but maybe you can use a standard table with secondary sorted index or secondary hashed index if you would prefer to do your loops with this option.
It will consume more memory than a standard table.
I've never tried. But if I'm displaying an ALV - I doubt memory is going to be an issue!
Btw, SPLIT ... INTO TABLE requires a standard table. It gives a syntax error with a hashed table. Again, volumes are usually small, so it's not really an issue.
Amen to that and, for the beginners, that's really all that needs to be said on the subject.
Thanks for this blog as well, especially the info about the debugger setting might come in handy at times.
By the way, it’s called binary search because the search always picks one of two directions to continue the search by comparing the value.
Therefore it will perform in the worst case with max log n comparisons, notation O(log n), to find the value or determine it can’t be found, where n is the number of items in the table.
Whereas if you have a Standard table (also non sorted and without keys), you will have a sequential search that has to do max n comparisons, notation O(n).
And a Hash Table search, with a well defined hashing algorithm, has on average a search time of O(1), meaning after the hash algorithm is performed it can on average find the value right away. But in the worst case it will also be O(n), in case all values are in one bucket.
Like I wrote the last time, it’s also important to consider insert and delete performance as well, if you really want to tweak your program.
Thanks for reading me. 🙂
For the next post I would like to complete the performance tests of the first post with hashed tables and parallel cursor and I would like to add tests for inserts/deletes over a standard, sorted and hashed table.
If you would like to see any specific test feel free to make the suggestion if it’s something i know how to do I will include it.
There are already several blog posts about the performance of internal tables...
And also https://help.sap.com/doc/abapdocu_753_index_htm/7.53/en-US/index.htm?file=abenitab_perfo.htm
I know, and most of them are wrote by people with more knowledge than me. But i like to test everything by myself and the posts have a double purpose, i hope someone can find the information useful, but at the same time i'm trying to practice my awful english and i'm getting feedback from experts who share links that help me learn, so i hope i can see you on the next post 😉
I admire your tenacity but I wish it would've been used more productively in educating ABAP beginners in other concepts.
The information about debugger setting was useful but I feel BINARY SEARCH is getting more attention in these blogs than it deserves.
It's not just about "how", it's about "when" or even "whether". Sorry, I disagree there is any special "importance" to BINARY SEARCH. (I'm getting tired of spelling it out every time but the abbreviation would be BS and, come to think of it, that expresses my opinion of it exactly.) To me, it's more important to know about other table types, secondary indexes, and ABAP tools that can helps us to make the right choice.
Related to BINARY SEARCH i agree with you in most of the points, i tried to highlight cons. Related on how to educate people i do not agree so much, especially with this other comment.
It's just my opinion, but i think everything has to be explained to beginners, so they can take his own decisions based on his true knowledge, not in the knowledge of someone else. If no one give them the tools to understand why, they will not be capable of solve problems by themselves.
If you don't mind i would like to add a link to your post, this post is aimed at beginners and if I had just discovered BINARY SEARCH today I would like to know about different opinions.
I suspect there is a confusion between "not explaining" and "not explaining at the right time".
Learning something new is hard and it's easy to get overwhelmed with new information, it can be like drinking from a fire hose. The role of the more experienced developers is to guide the beginners on learning in the most efficient manner.We already know, from our experience, what commands or techniques are most important to master, so that's what we would emphasize. BINARY SEARCH just does not fall into that category.
Last year my family visited Copenhagen for the first time. It has so many sites, you can easily spend months just sightseeing. We didn't have months, we had only a week with a "handicap" in form of my kid and elderly parents. 🙂 So I found a guide book from a well-known author that had advice on what to see if you have limited time. E.g. if there are 5 castles, which one to make a priority if you also want to see a museum?
From my point of view, telling the ABAP beginners READ... BINARY SEARCH is "important" would be the equivalent of a city guide saying "hey, forget this royal palace and that museum, just visit my mom's favorite bar instead". And while it's possible that it's the most fantastic bar (and if I had a week I'd gladly visit it), do you feel that, objectively, it would be a good advice for a visitor?
If you want to post what you're learning, that's fine (and it's still better than many other blogs posted on SCN) but I disagree with the statement that this command is of any importance. Definitely not to the beginners who would be at higher risk of misunderstanding when it's appropriate to use it and when it's not.
There are much more interesting areas of ABAP where your enthusiasm could be applied more productively and I hope you consider that.
Thank you.