ABAP for newbies – Importance of BINARY SEARCH (3)

former_member571549 · ‎03-09-2020

Hi,

This is the third post about the subject, if you are interested in the subject you can take a look at:

Importance of BINARY SEARCH

Importance of BINARY SEARCH (2)

I'm going to test the performance of different techniques, first from a READ perspective and then from an INSERT/DELETE perspective, i'm using a developer version and i have no SAP modules, i will use very simple internal tables for the test.

I'm running SAP on a virtual machine and i found the performance not very consistent so i ran the tests 100 times and i will be using an average time for the results.

The read techniques we are about to see are:

BINARY SEARCH - Both tables will be STANDARD TABLES, head table needs to be sorted with a SORT statement, and i will be reading entries with BINARY SEARCH, this option can read only one entry so my loop will be from positions to head table.

LOOP WHERE SORTED - Positions table will be SORTED TABLE i'm going to access the table with partial index to get the benefit of WHERE optimization.

LOOP HASHED - Head table will be HASHED so i will be looping positions and i will look for the entry on the head table, the reason for this is that i have to access hashed table using full index and from head to position i would have only partial index (i would miss position number).

LOOP WHERE SECONDARY SORTED - Positions table is STANDARD TABLE but it has a SECONDARY SORTED INDEX i will be using for the loop.

LOOP SECONDARY HASHED - Head table is STANDARD TABLE but it has a SECONDARY HASHED INDEX i will be using for the loop, the loop is from positions to head as in the hashed table test.

PARALLEL CURSOR WITH READ - For this option i will do a binary search to find the index of the first position that match the head entry and i start looping from that index until i found a different key, both tables are STANDARD TABLES and positions table needs to be sorted with a SORT statement.

PARALLEL CURSOR WITHOUT READ - For this option i will be using STANDARD TABLES, both need to sorted with a SORT statement, then i can assume first head entry matches with the first position entry and i can assume the positions index where i find a different key matches with the next head entry so instead of do a READ TABLE for each head entry i just keep the last readed index.

Some of the tests were seen on the first post, the new options we are about to test are:

Added tests for hashed tables, i'm not going to explain why this tables are good because i don't know much about the algorithm, are usually recommended for programs which mostly needs to read data, and not so much recommended when the program needs to insert/delete entries to/from the internal table.

Added tests to check the performance of STANDARD TABLES using secondary indices, we will see if secondary index perform as good as primary.

I got some info of parallel cursors and i added the tests for the better performing option parallel cursor without read.

READ TESTS

For read test i will not add the sort time to BINARY SEARCH and PARALLEL CURSOR techniques because the use of sorted and hashed tables have some performance overhead too when populated.

You can see exactly the code tested in the github repository at the end of the post, program YJRS_ABAP_003.

First test

Seed: 1

Head entries: 50000

Positions per head: 1

Executions: 100

Results:

Total loops done: 50.000

STANDARD TABLES - SORT TIME - 29.922 ms

BINARY SEARCH - 47.903 ms

LOOP WHERE SORTED - 116.885 ms

LOOP HASHED - 31.899 ms

LOOP WHERE SECONDARY SORTED - 120.095 ms

LOOP SECONDARY HASHED - 31.572 ms

PARALLEL CURSOR WITH READ - 69.499 ms

PARALLEL CURSOR WITHOUT READ - 24.603 ms

For this test PARALLEL CURSOR WITHOUT READ is the best performance option,and SORTED tables or index are not having a good performance, not even close to BINARY SEARCH.

Second test

Seed: 1

Head entries: 50000

Positions per head: 5

Executions: 100

Results:

Total loops done: 250.000

STANDARD TABLES - SORT TIME - 72.149 ms

BINARY SEARCH - 207.195 ms

LOOP WHERE SORTED - 152.801 ms

LOOP HASHED - 125.800 ms

LOOP WHERE SECONDARY SORTED - 161.138 ms

LOOP SECONDARY HASHED - 126.957 ms

PARALLEL CURSOR WITH READ - 90.607 ms

PARALLEL CURSOR WITHOUT READ - 46.742 ms

Once again PARALLEL CURSOR WITHOUT READ is performing the best, HASHED table and index are not as close to PARALLEL CURSOR as in the first test and BINARY SEARCH start to struggle.

Third test

Seed: 1

Head entries: 50000

Positions per head: Random number between 1 and 10

Executions: 100

Results:

Total loops done: 275.164

STANDARD TABLES - SORT TIME - 40.195 ms

BINARY SEARCH - 226.745 ms

LOOP WHERE SORTED - 157.758 ms

LOOP HASHED - 139.410 ms

LOOP WHERE SECONDARY SORTED - 165.673 ms

LOOP SECONDARY HASHED - 143.887 ms

PARALLEL CURSOR WITH READ - 92.912 ms

PARALLEL CURSOR WITHOUT READ - 48.403 ms

When number of positions are randomized we get results very close to the second test.

Conclusions

Secondary indices seems to have a performance very close to primary ones.

For my data tables it seems that if we only have one position parallel cursor without read, hashed table and index and read table binary search are good options, but as the number of positions grow parallel cursor becomes better and binary search is no longer an option. It is not an extrapolated conclusion for any case, i recommend to perform your own tests if you need to tweak a program, you can download YJRS_ABAP_0003 program form github repository you will find at the end of the post and modify it to fit your needings.

INSERT/DELETE TESTS

Now we have some techniques for reading but some of this techniques could have a performance impact if your program needs to insert or delete lots of records, so to get a best judgement on when to apply each one we are going to test the time differences between STANDARD, SORTED and HASHED tables.

When inserting data to STANDARD TABLE i will be using APPEND as it is the recommended statement by SAP for this table typekind. For SORTED and HASHED TABLES i will be using INSERT. SORTED TABLE could be populated with APPEND but only if the line you are appending is really the last one when sorted, for HASHED tables APPEND is not possible.

For STANDARD TABLES DELETE i will be using a BINARY SEARCH to found the entry i want to delete and then i will delete it by index, the other table types are optimized to find the entry you want to delete as faster as possible.

For the test i will be using a table pre filled with 25000 entries, i will add 10000 more and then i delete this 10000 entries. I'm having some issues when deleting from secondary index tables, they don't seem to be doing any optimization, so i will be running this tests just 20 times instead of 100 times to avoid timeout.

Seed: 1

Head entries: 5000

Positions per head: 5

Total entries: 25000

Executions: 20

Results for INSERT one by one:

APPEND STANDARD TABLE - 1.133 ms

INSERT SORTED TABLE - 3.477 ms

INSERT HASHED TABLE - 3.085 ms

INSERT SECONDARY SORTED TABLE - 4.285 ms

INSERT SECONDARY HASHED TABLE - 3.641 ms

Results for INSERT all entries in one operation:

APPEND STANDARD TABLE - 306 ms

INSERT SORTED TABLE - 1.121 ms

INSERT HASHED TABLE - 2.082 ms

INSERT SECONDARY SORTED TABLE - 6.814 ms

INSERT SECONDARY HASHED TABLE - 6.229 ms

Results for DELETE:

DELETE STANDARD TABLE - 13.009 ms

DELETE SORTED TABLE - 30.916 ms

DELETE HASHED TABLE - 25.566 ms

DELETE SECONDARY SORTED TABLE - 2.074.537 ms

DELETE HASHED SORTED TABLE - 2.156.181 ms

Conclusions

No surprises here for the best option, STANDARD TABLE performs better as it has no overhead in this kind of operations, i'm having issues with SECONDARY INDEX even when specifying the key it must use, maybe i did something wrong, if someone finds any mistake in the code i will be grateful to know.

In the comments Sandra talks about lazy update technique used by the implementation of the secondary indexes, if i have understood the documentation well, INSERTs on seconday index tables does not trigger the index update immediatly but in the next usage, this must allow to optimize multiple INSERT calls that would not re sort the table in each INSERT.

Sandra also found the reason for the DELETE performance on secondary index tables, as specified in documentation there's no optimization for this operations:

deleting a row from a standard table using a secondary key cannot be optimized because the index entry to be deleted must be searched linearly.

Final Conclusions

As seen in the examples READ techniques performance will vary depending on different factors such as table data, i can't give a rule for perfect performance, you must try to know as much as possible about the data you will have and then you can find the best option by doing some testing.

If your program will perform INSERT/DELETE operations intensively you may be forced to use a non optimal READ technique, instead you must choose the technique that produces less overhead in that operations.

In some cases you will need to think about memory too, in such cases avoid hashed tables and secondary indexes.

Personal opinion on BINARY SEARCH addition

In my opinion binary search addition just have one real problem:

If you forgot to sort your table it could be hard to detect, making this issue susceptible to reach productive system.

I also want to share this link The War on BINARY SEARCH where you can read a different opinion so you can make your own choice.

Many people thinks binary search as addition is useless so let's guess in our first test case i used the recommended sorted table but the volume is so big that i'm having performance issue, let's guess also memory is struggling too, if you only read people against of binary search they will say to go for a hashed table to improve performance, and if we only think about performance hashed tables are better than binary search. But you must know that hashed tables needs more memory than standard and sorted ones, so, if i'm having memory issues hashed tables will put even more stress on memory. For the first test case binary search doubles performance without any impact on memory. It was not so difficult to imagine a use case for binary search, you just need to know the pros and cons of each statement you use.

EDIT: As Sandra mentioned in the comments loop it’s not exactly the same in the read binary search and loop sorted tests, I didn’t remember it when writing this point so you can’t expect such a big increase without this difference, in the first post having the same loop binary search performed better by a bit consistently, if you have a case like this the written will apply, remember the title of this point Personal opinion 🙂

EDIT 2: To compare same LOOPs i must be comparing LOOP WHERE SORTED with PARALLEL CURSOR WITH READ which in my particular case performs better for all 3 tests and provides and increase of performance of about 40% in test 1. As explained on Sandra's comment sorted tables must always work at least as fast as binary search, and my argument should always be wrong, but as seen in the tests the theory is not working as expected. I tagged this point as personal opinion because it's not based on theory but in the numbers i really get, i could recommended to follow every guideline but i rather recommended to test if performance really matter, and in my personal opinion this is the important point of the post, do not blindly follow guidelines, understand it and if they are not working good enough for your case try something different.

You can find full code at this github repository: https://github.com/jordirosa/ABAP4Newbies

The repository report programs for this examples are YJRS_ABAP_003 and YJRS_ABAP_004.

Regards

EDIT: I changed the read techniques names to match the tests for and easier understanding of test results, i added a mention to lazy update technique, and i updated personal opinion where i was wrongly comparing the BINARY SEARCH loop with the LOOP WHERE SORTED one.