ABAP tips: Checking existence within an internal table
I got a blog-worthy surprise when I did a quick performance test on two different ways to use the new-ish (7.4) table expressions to perform a search on an internal table.
My scenario involved internal table searches where the values would be missing a lot of the time. The unexpected result was that when a row can’t be found, using table expressions with exceptions is ten times slower on average than all other lookup methods, including the good ol’ fashioned READ TABLE with SY-SUBRC.
Quick primer for those still getting accustomed to 7.4 ABAP (feel free to skip to the results):
With table expressions, finding an entry in a table is really neat. I have a table that includes a field “ID”. So the following READ TABLE can be rewritten using a table expression (the bit in square brackets):
READ TABLE itab INTO row WITH KEY id = find_id.
"Equivalent to:
row = itab[ id = find_id ].
If the matching entry cannot be found, you need to catch the exception:
TRY.
row = itab[ id = find_id ].
CATCH cx_sy_itab_line_not_found.
...
ENDTRY.
Another alternative is to use a VALUE constructor expression where you can add the keyword OPTIONAL (or DEFAULT) to just initialize any non-found values.
row = VALUE #( itab[ id = find_id ] OPTIONAL ).
IF row IS INITIAL.
...
This particular usage of VALUE may seem a little awkward. I mean, why use x = VALUE #( y ) when you can just write x = y? The VALUE constructor’s sole purpose here is the OPTIONAL bit that lets us do away with the exception.
As I was working on a performance-sensitive component, I tested it to see what performs best.
Results:
For completeness, and additionally with suggestions by Former Member, Quynh Doan Manh and Uwe Fetzer in the comments, I also added a couple of other ways to look for an entry in a table:
- The trusty READ TABLE … WITH KEY, with and without TRANSPORTING NO FIELDS
- line_exists( )
- ASSIGN itab[ … ] – which (bizarrely) sets SY-SUBRC instead of producing an exception
- REF #( ) instead of VALUE #( ).
I tested a million failed lookups on a 7.50 HANA system. Here are the results in microseconds:
line_exists( ) : 610,758
READ TABLE ... TRANSPORTING NO FIELDS : 671,368
READ TABLE : 671,115
ASSIGN itab[ ... ] : 707,929
REF #( itab[ ... ] OPTIONAL ) : 888,562
VALUE #( itabl[ ... ] OPTIONAL ) : 961,803
TRY ... itab[ ... ] CATCH : 6,325,265
I did not expect the last one at all.
So the take-away here for me is that a TRY-CATCH may be easier to read, but should not be used in performance-sensitive code unless you expect the values to be found most of the time. I’m happy to sacrifice a little performance for readability, but this is a significant impact.
I suspect that this applies to CATCH blocks in general, but that’s another analysis for another day.
For comparison, I also re-ran the same but this time with a lookup value that did exist. Two things happened:
- the exception was no longer an influence (to be expected)
- line_exists( ) became a poor choice in my scenario, because we need to double up the the search in order to also read the record:
IF line_exists( itab[ id = find_id ] ).
row = itab[ id = find_id ].
To summarise:
- If you don’t need the data, line_exists( ) is fastest.
- If performance is number 1 priority and you need the data, READ TABLE is fastest.
- For compact and/or readable code, use table expressions.
(Yes, I know, ‘new’ ABAP can be used to make code either more readable or more cryptic, but that’s another discussion) - TRY-CATCH with table expressions can be a useful way to structure your code (e.g. to use a single catch handler for multiple expressions), but be mindful of the expected failure rate and performance-criticality of the component. If we’re talking anything less than thousands in a short space of time then you can safely ignore the impact.
Update:
After several comments and suggestions, I uploaded my test program to GitHub at abap_itab_perf
It was just a quick-n-dirty originally for comparing just two options and just grew as a copy/paste of alternatives. Feel free to play with it or even send any updates back via GitHub.
Interesting analysis.
Like you , I suspect the performance difference lies in the exceptions/TRY...CATCH mechanism.
I think overall the VALUE #( ) could be used. its make code easier to read, if data not exist its fast enough, and incase data exist you dont have to check + assignment like line_exists( ).
If you compare only line_exists( ) with read table, I think read table should have TRANSPORT NO FIELD, to be fair :).
Thanks for your comment. I partly agree, but still find the additional VALUE clumsy. I have no idea why SAP didn't just do a SY-SUBRC like with field symbols, particularly as there is such a performance penalty.
Re. TRANSPORTING NO FIELDS, my initial test for my real work was where I needed the value, I added line_exists as an afterthought for the blog. But it's a good point for a complete comparison, so I've added it and updated the results directly in the blog above, thanks.
Hello Mike,
could you please measure the run-time behavoir of this variant?
No exceptions, check sy-subrc, it should be good enough.
JNN
Thanks for the input, good point! I remember reading this a while ago, and thinking it's bizarre. But we are where we are, and ASSIGN turns out to be a great result! My specific use case was a returning parameter so I didn't even consider an FS.
Hello. This construction not work.
Hi!, please be more explained as we cannot know what is in your mind!. Thanks!.
Hi Mike,
Good comparison for internal table expressions. Additionally we can’t use binary search option in 7.40 table expression.
Good point, but actually we don’t need to, as key fields of sorted or hashed tables provide a better alternative to binary search. But it's certainly worth mentioning as not everyone knows all the possibilities.
See https://help.sap.com/http.svc/rc/abapdocu_752_index_htm/7.52/en-US/index.htm?file=abentable_expressions.htm
Personally I think BINARY SEARCH is unreliable and should be avoided if possible. I don't like the idea of relying on prior code sorting the table and that nobody has APPENDED a row since then. If you need to search you need a key, so there's no reason not to use a SORTED or HASHED table where you cannot go wrong.
Isn't that SORT an overhead if you have large internal table. The value added by binary search isn't fantastic after that time consuming sort. Like you said, sorted and hashed tables are the best bets.
A SORT can be beneficial in certain scenarios. Consider adding values one by one in random order – every INSERT into a sorted table adds performance cost, so here a one-off SORT at the end is likely to be more efficient.
BUT… the reason I say BINARY SEARCH is unreliable is not for technical reasons but has to do with the developer(s). Example:
This example is obviously easy to work out, but complex programs with several development iterations are a different story - especially when the filling, sorting and reading are spaced out in the code.
This mistake is not possible with sorted or hashed tables. For that reason alone I avoid BINARY SEARCH unless there is a major benefit.
Since I started using SORTED and HASHED tables 18 years ago, I've never had a need to use BINARY SEARCH on tables I've defined myself.
It baffles me that people are using these 7.40+ features... yet continue to use BINARY SEARCH instead ot SORTED/HASHED.
Hi Mike,
can you add this to your measurements?
Good point, updated in the blog. In my experience references are usually slightly slower than field symbols, and this is no exception.
Nice blog! I always appreciate when someone does all the work for me and comes up with such simple, practical recommendations. 🙂 Well done.
Great Blog Mike, I have started following this blog. Hope you keep updating it with all your findings 🙂
Hah, it only started with the intention to compare two table expressions. Then I added another two and then the good suggestions came in 🙂
Great post, thanks for sharing.
Stating the obvious here (which actually was already said), the big difference there is the EXCEPTION, not the different ways of read. Especially for the first 4 cases. I reckon it would probably be a good idea to run the tests a few times and compare the averages (not sure if you did this already).
Also this:
should never be used, as per official ABAP programming guidelines.
Cheers.
Hi Mike,
I agree with Custodio and I know you enough to say that I shouldn't ask but ... . is this the average of multiple execution of the same tests?
Anyway, I'm a bit surprised as I expected the results between READ TABLE and VALUE with OPTIONAL to be more leveled (even though is only 300 microseconds difference .. 0.0003 seconds!). I guess I'm being picky. 🙂
As always, well done and thanks for the info. Cheers!
After better reading of the article I noticed a small big detail, it says you "tested a million failed lookups".
So... after reading (or failed reading) a million times the difference between ( READ TABLE vs VALUE with OPTIONAL ) is only 0.0003 seconds?? Really? Wow!
And as far as I see 0.0003 seconds difference is neglectable. I will still go with the VALUE statement.
P.S.: did you really do a million lookups? 🙂
Yup, a million. Personally I see 0.3s/million an acceptable margin for code clarity and consistency when the data volume is in the thousands or the DB time far exceeds that. The 6 seconds however took me by surprise and is something to keep in mind.
I uploaded the code if you or anyone wants to play with it further - feel free to update back.
Github: abap_itab_perf
Hi Andre,
Of course it goes without saying that duplicate searches are to be avoided, I thought that was too obvious to require explanation 🙂
It's 300,000 microseconds, or 0.3 seconds. So the difference compared to an exception becomes over 6 seconds when running through a million records. (Yes, that silly British-American comma that much of the world uses as a thousands separator. )
You can use line_index( idem ) and then read the internal table by the index if it's not 0. 😀
It is interesting to see that the assumption of 'ABAP 7.4' elements being better than traditional elements fails sometimes. Thanks for sharing. Developers need to make a choice between readability of the code and performance at many occasions.
Regards,
Kumud
Agree, we (myself included) automatically assume new=faster. But on the other hand, I am not aware of any claims from SAP that the 'new ABAP' statements are faster, it's almost always been about function, readability, new possibilities, etc. Faster to write perhaps 🙂
Thank you for the blog and testing.
So I am staying with good old READ TABLE and line_exists( )
EDIT: read bellow comments, thanks Uwe and Mike.
No, this should definitely not the conclusion of this blog post.
Sorry, I do not want to give misleading "conclusions".
Should I add "checking existence within an internal table with best performance" then?
The conclusion, as it often is, is "it depends".
There are two things to consider in performance: absolute and relative performance.
Absolute is how much are you processing. So how much time does something consume over the workday and how much is it going to affect the user. Remember this was for a million executions, so for a dialog user querying 1000 records, the impact of 0.006 vs. 0.009 seconds is nil.
Likewise for relative performance: If a job does a bunch of DB stuff and runs for half an hour then even 6 seconds is all but meaningless.
So in most scenarios the difference between the options (apart from exceptions, which was the point of the blog) is a perfectly acceptable tradeoff for consistent and readable code.
I had a small question, shouldn't try....catch be used where ever we use the new table expressions
Eg:
A good question, but no. The function, and several of the variants detailed above, are designed with exactly the purpose to handle the exception.
If you follow the very first link of the blog you'll see the documentation states:
If the specified row is not found, a handleable expression of the class CX_SY_ITAB_LINE_NOT_FOUNDis raised in all operand positions, except when
Didn't read that in detail…Thanks Mike...
Very nice article Mike .
I am also working on performance and facing challenge using multiple line_exists for each value .With large data every time checking existence will not help much compare with sy-subrc .
I am working to change my program hope so it will help .
Very nice post, thank you for that. I was asking myself exactly this question and was thinking about how to test it, then I found your post 🙂
It is a very useful post. I am wondering if we can use the "line_exists" for checking multiple conditions of the same value. in this case how can we check for the condition,
if line_exixts ( itab[ id = find_id1 or id = find_id2 ] )
i know its syntactically incorrect but I am trying to understand how we can achieve this.
Good question, line_exists and READ TABLE both can only compare individual fields against a single value.
LOOP AT has a more flexible WHERE clause and can be used for this purpose, but be sure to include an EXIT if you only want a single result.