Personal Insights
Improving ABAP-code to be considerably more performant with just a few tweaks
This blog post is a bit of a sequel to an article I published last year about where transaction SAT (Performance Analysis) can be useful. While last year’s blog post was general in nature, this one is about a specific example where the performance analysis led to code-changes and eventually considerably shorter run times.
The issue
A few weeks ago I happened upon a long running job in one of our productive environments – and I mean really looooong running as it took about 27 hours to run to completion from start at 5 in the morning of Day 1 to finish sometime between 9 and 11 on Day 2. One “advantage” of such long running jobs is, that you can easily take a peek at where it’s whiling away its time by going to transaction SAT after identifying the server the process is running on with the help of SM51. As I’m always curious of what is causing such long runtimes, this is what I immediately did.
Instead of having SAT execute a specific program or transaction I made use of the option “In parallel session”. It also helps with the analysis to check the box to “Determine Names of Internal Tables”:
SAT Start
After clicking “Switch On/Off” the currently active processes are displayed with the interesting one most likely sticking out like a sore thumb by the already accumulated runtime:
Position the cursor in the relevant line and activate the measurement for 1 minute or 2 (more usually isn’t needed to get a first impression without creating too large an output):
Initial analysis
After deactivating the measurement again, you’ll be returned to the SAT evaluation. Here is an example of what I saw when I first did this for the long running job and after sorting the Hitlist descending on Gross%:
The number of hits gave an indication of how many items were being processed and the gross percentage shows where most of the time was spent, namely within a couple of loops over internal tables (and yes, I had forgotten to follow my own advice to determine the names of internal tables). The “gross microseconds” looked rather interesting to say the least given how high these were for processing not even 17,000 items. What I really like about SAT is that it gives immediate access to the underlying code.
The program in question extracts a large number of material master items from MARA and related data from several other tables. It offers a choice of including 48 (full) or just 11 columns (reduced) in the output. The reduced option was added as an afterthought and it turned out that when it was added it was unfortunately only applied to the output but not the data retrieval and processing logic. A quick initial fix therefore was to switch off the optional selection of classification data via the variant when only the reduced output was requested for which this data is irrelevant. As this had accounted for about 40% of the overall runtime this simple fix brought the runtime down to about 7 hours. Still rather long but obviously a lot better than 27!
Code changes
While checking the information provided by SAT, I had identified several places where the existing code could be improved and it didn’t take long to get the go-ahead to apply whatever I could within about 2 days. The remainder of the blog post will focus on some areas of the code where I subsequently applied some fairly simple fixes.
The very first thing I did was to split the processing logic to have one “thread” to just provide what is needed for the reduced field list (even if hundres of thousand line items all told) and another to get all the data for the full field list (usually just needed for some items in one go). Doing this had the added benefit of breaking up the code into smaller routines which will now give a much more detailed view in the call monitor data (SCMOND) to see which of the data is even regularly retrieved, i.e. still needed.
I also changed some instances where the “FOR ALL ENTRIES” construct could be eliminated in favor of simply including the needed data with a JOIN in the main SELECT. Where this can safely be done can either be easily seen in the code or an ATC-check might provide helpful pointers.
The biggies
While applying these types of changes, I also kept an eye on SAT which even in the Dev-system with just a bit over 20,000 items in the master data showed improvements. There however remained on obvious pain point, namely the LOOP over an internal table called T_REPLACED which made up over 90% of the runtime according to the SAT hit list:
Let’s take a look at the code:
LOOP AT t_tab ASSIGNING <t>.
....
LOOP AT t_replaced INTO replaced_wa
WHERE matnr = <t>-matnr.
IF <t>-z_vmat IS INITIAL.
MOVE: replaced_wa-bismt TO <t>-z_vmat,
replaced_wa-artkz TO <t>-artkz.
ELSE.
MOVE 'X' TO <t>-mehrfach.
EXIT.
ENDIF.
ENDLOOP.
....
ENDLOOP.
Looks rather innocent, doesn’t it? However, the first issue is that we have a “loop within a loop situation” which is not ideal to begin with, but is especially bad if you are dealing with a program with hundreds of thousands of items in T_TAB. In addition, table T_REPLACED also contains many entries which makes the WHERE clause rather problematic performance-wise. As soon as the first – in this case – match on MATNR is found, the loop will continue from that index to the very last entry in the internal table, often not finding anything of interest. T_REPLACED was just defined as a standard table.
I replaced this code with the somewhat more complex combination of READ TABLE with a “WHILE MATNR doesn’t change” construct, making sure that T_REPLACED was properly sorted:
READ TABLE t_replaced INTO replaced_wa
WITH KEY matnr = p_matnr
BINARY SEARCH.
IF sy-subrc EQ 0.
DATA(tabix_save) = sy-tabix.
WHILE replaced_wa-matnr EQ p_matnr.
IF p_z_vmat IS INITIAL.
p_z_vmat = replaced_wa-bismt.
p_artkz = replaced_wa-artkz.
ELSE.
p_mehrfach = abap_true.
EXIT.
ENDIF.
ADD 1 TO tabix_save.
READ TABLE t_replaced INTO replaced_wa
INDEX tabix_save.
IF sy-subrc NE 0.
EXIT.
ENDIF.
ENDWHILE.
ENDIF.
And yes, I realized while creating this write-up that I might have done this a lot more easily by just changing T_REPLACED to a sorted table. I however didn’t touch all the data definitions because I only had a limited time budget to apply some quick fixes and the changes are all tested now and will go into production soon. If you think that I could have left the old loop logic and only change T_REPLACED to a sorted table, please let me know in a comment and I’ll make a note of that to apply with the next change of the program.
The result
As of this write-up, the program has been successfully tested and produces the same results (a download file with about 79MB in size) as it did before my partial re-write. The runtime is also considerably faster than it was to begin with and even after the inital quick fix to no longer get the classification data if it’s not needed. Instead of spilling the beans immediately, I’d like to give you a chance to guess how much time the program now needs to create the reduced output, which – to recap – took 27 hours when I noticed the issue. I’ll tell you in a couple of days who came closest (sorry, no prizes)!
Any takers?
That's some good sleuthing! Nested loops are generally the culprits of long running jobs.
Anyway, without knowing the program details and the rationale behind your decision to replace the LOOP with a READ TABLE, i would have rather defined a non-unique secondary key on MATNR for T_REPLACED and used the key to LOOP on T_REPLACED. IMHO, that's the minimum invasive change one could do.
Cheers,
Suhas
Is there really nobody out there who'd like to submit a guess, how long the updated program now takes to create the ouput?
I'm really curious!
Cheers
Bärbel
I had a SQL connector which went from 150 down to 5min, so I'll say less than 1 hour, in your case 🙂
Andrea Borgia
Hi Andrea,
thanks for playing! While your answer is correct, it's also not "really close" 🙂!
Cheers
Bärbel
Using HASHED tables, I've had hours runtime down to a few minutes. So I'll guess 3 minutes.
Matthew Billingham
Hi Matt,
you are thus far the closest, but not close enough!
Cheers
Bärbel
27 Minutes ?
kai sicker
Hi Kai,
thanks for your guess, but it's too far off (see my other comments in the thread)!
Cheers
Bärbel
5 minutes.
It would be easier to predict if we knew what the total time with those 93% (doing unnecessarily slow big table operations) was.
I find solving performance problems is often (50% of cases) very enjoyable, because you can achieve 10-100 times improvement with small changes (5 minutes to 2 hours of coding).
On the other hand, I find that users rarely complain about slow runtimes. And it does not make sense to optimize the whole codebase. I even had trouble, when I wanted to target the top 10 SQLM "hotspots". If the program runs in batch, the users have no pain with long runtimes. And they often do not have time to test.
Edo von Glan
Hi Edo,
thanks for "playing"!
Here is a hint: when we activated a little log functionality for the batch job execution it showed that it needed 3 to 4 minutes to process 10,000 materialnumbers after we had eliminated the retrieval of classification data.
And yes, the users usually don't care much about long runtimes, unless they "need" the date first thing in the mornings but they don't get it until around 11am. On the other hand, if there's a general push towards "Green IT", eliminating needlessly long runtimes can be quite a low-hanging fruit to work in that direction.
Cheers
Bärbel
Edo von Glan, Kai Sicker, Matthew Billingham, Andrea Borgia
Hi All!
Thanks for your guesses out of which Matt's came closest but was still off. Now that the updated program ran for the first time in production and after I heard from the user department that the data looks fine, I'll resolve: believe it or not, the program only needed 15 seconds to create an output file with 930,000+ entries and a size of 79MB!
Here is a comparison between yesterday's and today's job:
Cheers
Bärbel
WOW, more than 1000 times faster!
Yay! I win my no-prize!
But it's Bärbel whose really won here, I feel!
Hi Bärbel,
thank You for this blog and Your tips about ATC and wow, that's impressive - from over 24 hours to few seconds! I also think it doesn't matter, whether your report ist running in background and users are not offended by the long run - when it's running unnecessary long, then it's using resources, which could be used for other reports or even freed completly.
But there is one thing I don't get in Your code. In the inner loop the value of current outer line is checked. I think in Your rewritten code this value can be found in p_z_vmat. If this value is not initial, then the inner loop schouldn't be executed in first place (respectively no READ TABLE in case when p_z_vmat is not initial). Well, I can imagine, that when the internal table is really large, it can take considerably much time to find the first line with the right matnr and that multiple times (outer loop). On the other hand, Your new run time deny this. Or should the field mehrfach be set to true only when y_vmat is not initial and matnr can be found in the replacements? Furthermore, if z_vmat is initial, you overwrite in the inner loop (or while-loop) two fields from the outer structure. So after the inner loop is done, you have only the values from the last record with the corresponding matnr. Wouldn't it be enough, to take just one row with corresponding matnr from the replacements?
Greetings
Michał
Hi Michał,
thanks for your reply!
This part is actually logic which I just copied from the old code. The field p_z_vmat is checked to find out if it's still empty, which will be the case for the first "hit" on MATNR. If that's the case, the content of BISMT is moved into that field. Then the next item from T_REPLACED is read and if it's still the same MATNR, P_Z_VMAT will now already be filled and we know that we are dealing with a material which has multiple replacements (and we don't care which they are, we just want an indication that this is the case). So, we set the flag and then EXIT the WHILE-Loop.
Does that explain it?
Cheers
Bärbel
Thank You, yes now it's clear!