SLEEPing on the JOB
In my current position, I don’t write very much code. In fact, the only code I write is usually operating system level scripts for reporting purposes, and that’s not a large part of my responsibilities. However, since I often look for performance issues in our enterprise systems, I read a lot of code. Usually, this code is ABAP, whether written and delivered by SAP, or modified or created entirely in-house. Despite the fact that I don’t write ABAP code, I know a fair bit about what works well and what doesn’t.
The ABAP code that goes to the database is translated into SQL. Though I’m no longer a full-time DBA, I can find poor performing SQL manifesting itself in various ways. The code that doesn’t go to the database is another story. Before that story, a short digression about reading code.
About 5 years ago, I bought the book by Diomidis Spinellis called “Code Reading: The Open Source Perspective“. The author’s goal was to share techniques for looking at legacy programs in order to determine ways to maintain and improve it, by first understanding the logic. Along the way, developers should pick up clues that lead to better writing.
As might be obvious by the title, the book examines open source code, as this base is widely available, freely sharable, and perhaps as a result of being contributed by multiple authors, sometimes in need of tuning or complete overhaul. Is this true of ABAP code? I’ve heard a few stories.
Dr. Spinellis’ Code Reading book is translated into Chinese, Greek, Japanese, Korean, Polish and Russian. Also look for his sequel: Code Quality, and his newest release, with Georgios Gousios,Beautiful Architecture: Leading Thinkers Reveal the Hidden Beauty in Software Design — “The book’s royalties are donated to the international humanitarian aid organisation Médecins Sans Frontières.”
Meanwhile, as they used to say on the TV and movie serial shows (back at the ranch), a common dilemma in parallel operations is that one process finishes before another. If your code depends on a database commit, an operator pushing a key, or a file to be copied from one place to another, you need to wait. The wait time required might be sub-seconds, or longer, and is often not known, or not even thought about, until the code is in production and real users find issues not seen in development.
So, what is your preferred method of waiting? Are you a pacer? Are you someone who sits and reads a book or magazine in the waiting room? Or are you a snoozer, who sets an alarm or asks for a wakeup call?
In my role as software quality reviewer, I look at system statistics, of which SAP provides an overwhelming quantity. After exhausting the first analysis at the database level, using tools such as ST04 SQL Cache Analysis, I have looked at performance traces at the application server level (ST05), memory caches (ST02) and at CPU workload statistics (ST03). The last of these sometimes tells a funny story, where more work is being done on the application server than on the database. A few years ago, I found code that was written to do nothing but wait. It looked a little like this:
REPORT ZZZDELAY MESSAGE-ID 00.
C_DELAY(8) TYPE N DEFAULT 5000000.
DATA: WHILE_DELAY(14) TYPE N.
WHILE WHILE_DELAY LT C_DELAY.
WHILE_DELAY = WHILE_DELAY + 1.
What does this code do? Besides waiting, I mean. It counts. From 5 million to zero. Or maybe to one, not that it matters. It reminds me of youthful games of hide-and-seek, where the seeker is required to delay by saying “One-Mississippi”, “Two-Mississippi”, etc. until the hiders have hidden.
What’s the problem, you say? This code does what is intended, it delays the execution of the next program step, as required. The way I found out about this was when we installed newer hardware, and the developers told me “our delay loop isn’t waiting as long as before”. The logic depends almost entirely on the specific system hardware to work correctly.
The biggest problem, which you have hopefully seen by now, is that the code uses CPU resources, and does no work. When this code runs, it kicks out useful programs and sits on a CPU, quietly moving bits from one register to another, until it is done.
I’m not sure about developers where you work, but I’m guessing if you asked one in your shop to fix the above, they might say, “But, it isn’t broken. It does what I want.” It might take a few reports (see ST03 above) to management and patient explanation that correct answers don’t always imply well-written code. The point of this blog isn’t to show you better ways to write your code, just to hint that you might consider doing so. Here are a few options for this specific case, none of which I specifically recommend for any purpose (in other words, run your own tests).
- SCN wiki code gallery snippet: To pause the ABAP program execution
- WAIT UP TO …
The code gallery example is fairly self-explanatory, though the wiki page itself needs a little gardening. The 2 function calls are documented in various places; one of my expert references (a person, not a book) said the FMCT call can be used in CATT scripts, which might not work with enqueues. The “WAIT UP TO” seems almost too obvious a code construct, so again, run your own tests.
Another alternate would be to let the operating system manage the delay, using the UNIX “sleep” command or a Windows alternative. These should be smart enough not to consume CPU, setting alarms that trigger after the appropriate interval.
Yet another option is to use a 3rd party external scheduling tool, with features that manage process relationships and dependencies, if the time delay is well above the second mark.
As it turns out, not every piece of bad code is simply a candidate for being exemplified as “don’t do it this way”, once in a while (or: meanwhile in the Saturday afternoon movie serial) we can reuse code in other ways. Once I observed that this loop was hardware dependent, I knew this would be one way to compare hardware.
The graph shows results from different hardware that I have tested, using the same 5,000,000 loop ABAP program. The years are approximate, and it isn’t important what hardware this is, just the observation that things change.
One final note: in the same CPU family, I found different results even at the exact same clock speed. This unexpected result tells me that more is going on than an on-CPU logic operation, with memory paging, I/O or other factors an additional influence. As they say in the movies, “to be continued”.