The Amityville Programming Horror (3/3)
The Amityville Horror – Part 3
What happens when one of the most widely used custom ABAP reports in your business is so complicated and fragile that you are scared of it and hide under the desk whenever its name is mentioned?
Worse still, there are a big bunch of change requests relating to this horror of a program, and you are the lucky ABAP programmer to whom all those requests have been assigned. If you don’t make the changes you get sacked, and if you do make the changes, the program is so brittle it will react like a vampire in sunlight and explode, no matter how crazy it is, and then the most important report in your business is stuffed, and it’s all your fault and you get sacked anyway.
This is the ever popular “rock and a hard place” dilemma – but does it have to be like that? Is there a way out of this nightmare?
Previous Blogs on this subject:-
In those prior blogs I advocated making tiny changes to try and plug the most horrible of the myriad wounds in the program, whilst making your new change which – normally – would have just added to the problem.
In real life I have just finished such an exercise – I am going through the process of what I actually did in these blogs, using monsters to disguise the actual nature of the program.
My changes went into production on the weekend, and I emailed one of the key business users whom I suspected to me the main user of this report, one of hundreds, but possibly the one who ran it the most, based on my analysis of the usage logs. In any event, I asked her if it ran faster or slower since the weekend and she said:-
I noticed there were additional fields added today, definitely quicker, I said earlier wow what’s going on !!!!!
That is the ideal outcome – the program does more than before, and much faster at that. Moreover, I have only really scratched the surface of the problems in the program, which as I said earlier, are more problems than there are stars in the sky.
There cannot be that many I hear you say. Well, after seventeen years of tinkering, I am reminded of a great quote from Albert Einstein which I might be paraphrasing a bit, but it goes as follows:-
“There are only two things that are infinite, as far as we know – the universe, and human stupidity. And I am not sure about the universe”
Let us return to some concrete examples. As I mentioned at the end of the last blog I was getting the horrific “more than 5000 records” message when I did an SQL trace on the program for even a small selection of data. What you have to do here is aggregate the data by table and see the tables which have the most number of reads.
In fact even without aggregating the data, just scrolling up and down, the worst culprits jump out at you and slap you in the face with a wet fish, screaming “Here I Am! Here I Am! Doing as many database reads as I can!”.
The first one I noticed was a massive amount of reads on VBAK. I found the code that was doing this, looping round a fairly large internal table doing a SELECT SINGLE each time.
The good news is that the GT_CONTRACTS has been COLLECTED by looping over the main table such that there is only one entry per VBELN. Presumably the logic of not including this in the main join is that not every sales order is created with reference to a contract.
The internal table only has two fields – VBELN and KTEXT inside it. What I did here was replace the looping by a FOR ALL ENTRIES, so there is only one request to the database. There will still be lots of lines in the ST05 trace, but a small fraction of what was there before. That’s a quick fix – going forward I could do a test to see if a LEFT OUTER JOIN in the main SELECT statement would be faster than just looking for contracts you know exist.
Next, I noticed a vast number of “identical selects” on KNVP/KNA1. This turned out to be the program looking for details of a particular partner function – a so called “project” – in a loop. The original programmer had the foresight to create a buffer table to prevent a database read to get the name of the project multiple times after it had been found once. This is very common, in this case table GT_PROJ is read, if an entry exists, hooray, that is used, and otherwise off we go to the database.
Here is where it gets really funny. The decision was made some years back by one of Baron Frankenstein’s business analysts (The Creature from the Black Lagoon) that some delivery addresses (ship-tos in SAP speak) would have “project” partner functions added for each one-off multi-year project to order a large number of monsters to commit evil deeds
So the request came in to alter this report such that the project number and name were added to every line. All well, and good, but no actual projects were created in the SAP system for the next three years. So every time the report was run in that period, which is many times a day by many people, there were a large number of database reads looking for something that was not there.
As a result, if there is no project, which was none of the time for three years, and even now that a few have been created is still the vast majority of the time, the buffering does not work. If you set a Z table to have single record buffering then the system is clever enough to buffer negative results as well, avoiding identical selects, but in this example the programmer is coding the buffering themselves.
I made two changes – I buffered the negative results, to avoid identical selects, and also altered the internal table to be HASHED to speed up the read on the internal table, even if only by a microsecond. Despite the fact that SORTED and HASHED tables came out donkeys years ago, they are still not used as widely as one might think.
Talking about partner functions, also at some point in the past it was decided by another business analyst (The Thing with Two Heads) that some sales documents could have different “bill-tos” (KUNRG) and “payers” (KUNRE) than the actual customer (KUNNR). Again, all well and good, no doubt a valid business requirement. One customer was set up this way, and thus it remained for many years.
The development department was told to add the bill-to and payer numbers and names to the report. The programmer assigned thought the easiest way to do this was to pick somewhere the main output internal table was looping, and add in two database reads, one for the RE function and one for the RG function. As we have seen they got into trouble with cutting and pasting code, and ended up reading the same partner function twice and putting the same data in two different sets of fields. No-one noticed because the one customer has the same value for both RG and RE.
After fixing the code so the correct partner data went into the correct field, the next step was to see if the partner function table – VBPA – was being read elsewhere. It turned out that there were two separate reads on VBPA – all at once for PERNR using a FOR ALL ENTRIES based on a list of sales orders, and then one by one in a loop for RG and RE. The obvious solution was to move the RG/RE reads into the initial read on VBPA, thus getting all three partners at once.
Whilst I was doing that I also encountered some very strange code indeed that, in effect, added a few extra random values to the list of partner functions getting read from the database. That had been there forever and no-one had ever questioned it. Since the random values were not valid partner functions, nothing had ever been returned. So I got rid of that code. There was probably a purpose for it originally, a purpose that obviously failed, and someone signed it off as working anyway, but since the change was so very long ago, and not documented anywhere and getting rid of it could have no downside.
Welcome to the Clown Forum
The best bit comes last. Eight years ago yet another of the Baron’s business analyst (The Killer Klown from Outer Space) thought it would be good to have the document clearing status of the invoice in this report, because it would help one person, once a month, with reconciliation. This status refers to whether an invoice has been cleared, partially cleared, or is still outstanding.
This did not seem an unreasonable requirement – after all you can see that status quite clearly on the document flow. You could say the underlying problem is adding more and more fields to the same report every single year, giving that report more responsibilities, as it were. We don’t like that in program design – the so-called “single responsibility principle” – so we do we like it in applications? The Fiori paradigm seems to go the other way, replacing monolithic applications that do hundreds of things with lots of applications that do one thing each.
In any event when this was assigned to a programmer they had a problem – the invoice clearing status was not in table VBFA. If it had been all would have been well. The obvious question was then – if the value is not in the document flow database table, how in the world does that value show up in the document flow? So they debugged the document flow.
It turns out the standard SAP function module RV_DOCUMENT_FLOW_INFORMATION is used, and returns an enhanced VBFA table which does indeed contain the invoice clearing status, amongst a lot of other data. So a big chunk of standard SAP code was copied into the custom program, ending with a call to that standard function, and then evaluating the invoice clearing status bit of it, to fill the new field.
There were three problems with this:-
- This report is run many times a day by a great many people. Only one person cares for the new field, and then only once a month, yet the field is derived every time the report is run, and stored as a hidden field.
- The standard SAP module to get the invoice clearing status is called in a loop, once per record, and there is usually a great many records in this report.
- The standard SAP module reads about thirty or forty different SAP tables, ending with BKPF and BSEG, which it uses to see if the invoice is cleared.
So, this was where the vast majority of the database reads were coming from, which is why we were getting the “5000 records already, do you want to continue?” on even a small selection of data. The vast bulk of the data being brought back was not even getting used, just one field from one of the very large amount of tables being read. In a loop.
I had the idea that if we are only interested in the results from the BSEG table, then maybe what we should do is just read that table, and maybe only once, not in a loop? Now, BSEG is a horrible table to read, as it is a cluster table, at least until you get onto HANA, but it could not be worse than before.
Now I did a search on this huge program to see if BSEG was already getting read, and yes it was, all in one go. However this only happened if the end user ticked “do you want invoice information?” on the selection screen. I asked the business analyst if the user who needed the clearing status ticked that box, and yes they did.
So, for eight years, until I came along to look at this, if someone did not tick the “invoice information” box they got ten billion un-needed database reads including one on BSEG, and if they did tick that box, they got the ten billion reads plus another read on BSEG that brought back the invoice clearing information and ignored it.
So, I moved the logic to populate the invoice clearing field to just after the read on BSEG that was happening if the user pressed the “invoice” tick box, and got rid of the call to the standard SAP function module altogether.
The end result is that the report ran much faster for the one person looking for the clearing information, and much faster yet again for the vast bulk of people who did not tick the “invoice” box.
Regression testing in QA revealed no change in the data returned, all was well, hence the highly complementary comment from an end user I noted above.
The moral of the story is that once any given program gets over a certain level of complexity, then horrible things can easily get added and no-one is ever the wiser, possibly for many years.
Stop While I am Ahead
I could have kept going, I have no doubt there are thousands more bugs and odd behaviour going on in this “argument against evolution” of a program, but I felt that was enough to be going on with.
Some people have said that an ALV report like this has no need to be written in an OO manner. When this program started out, no doubt that was true. However as it got more and more complicated it started to break down and decay. Now, you could say that would have happened in an OO program as well.
Now, there is nothing magical about OO programs that makes them read the database in a better way. All the horrors about reading data far too many times could well have happened if methods were used rather than FORM routines.
However, a well written OO programs would have had tests – you might think tests are crazy in a read-only report. However, they would have picked up the problem with the bill-to and payer data going into the wrong fields, and it would have been a hell of a lot more obvious that there were two almost identical routines getting invoice information.
Knocking Down the Hammer House of Horror
I have another vital program I have to change every month, each month adding something, making it more complicated. I try to come out each time with a zero sum game, removing one bad thing for each new thing I add.
For example last week I had to make two changes. One was trivial, one involved changing the logic as to how a certain value was calculated – logic that had been wrong for 17 years, and I only noticed when someone told me to make another change, one that was not needed at all.
Firstly I noticed the routine I was fixing did too totally unrelated things. It calculated the value, and then it went off and looped through the output table, turning certain lines green. Since there was no logical relation between these two tasks, I spit them into two routines, each with a name saying what that routine did.
I also noticed the value being calculated was a global variable. First I changed the routine so that the value calculated became a parameter. This is a procedural program so I could not have it as a RETURNING parameter as I would like. Anyway inside the routine all references to the global were replaced with the parameter. Naturally the global variable is still mentioned when the routine is called, but it has gone one level up the call stack. I find if you keep doing that, eventually you find it does not need to be a global variable at all.
Lastly I did the ever popular SLIN exercise, and got rid of a few unused variables, and then used the code inspector to detect a FOR ALL ENTRIES with no check if the table was entry.
Once again, nothing too dramatic, and I forced myself to stop there. Most programmers would not even do the smallest amount of tidying up, as it is not what the current scope of work tells them to do, and they are worried about causing an error as a result. The point is all this tidying up is supposed to make things clearer, thus making errors easier to spot/debug in the future, and making programs easier to change.
You probably have heard the famous quote:-
“The only thing necessary for the triumph of evil is for good men to do nothing.”― Edmund Burke (in a letter addressed to Thomas Mercer).
I could rephrase this as follows:-
“All it takes for the Amityville Horror to triumph is for good programmers to leave obviously bad code the way it is, because they are scared of it”