Skip to Content
Author's profile photo Jim Spath

Pssst, ABAP Detective, you got a second?

.

.

.

I was working the undercover squad, trying to take out a mob of no-ops that had infiltrated the infrastructure when I got a tip about a Code 16. Though I’m not on the regular ABAP force, I’m called in when the evidence points outside the straight and narrow ABAP world, to the seamy underbelly, I mean, the virtualization layer.  With the site reorganization, my old case notes might be hard to find, so I’ve started tagging them with abap_detective.  Maybe that will help the future Dr. Watsons. Suffice it to say that while I’ve not been reading as much ABAP as in the past, the transaction volumes, spurious behaviors and fatal endings have not escaped my vision.

The report, as often happens, was ambiguous. A job ended with a 16. More than one job, in fact. For those unfamiliar with ABAP policing protocol, a 16 is pretty bad news. You can survive an 8, certainly a 4, and definitely a 2 or a 1. But a 16, that’s a one way trip to the place where they put a tag on your process. The place where the dumps go. It’s not pretty. Sherlock Holmes had the Sign of Four. We have the Sign of Sixteen.

Unfortunately, the early warning signs of a 16 can be misleading. What the dispatchers told me was running the sheet again made the error go away. And the message they showed me from their snitches didn’t mean anything. I should know better than to trust someone else’s interpretation of a messages. In my world, you make your own error messages by running the bad code on your own. What better way to run bad code out of town, than to know the subroutines, log locations, and seedy variants.

Job scheduling

By no small coincidence, I started seeing error messages on another case in another domain that ended up solving the perp of the reports above. First a little background on that case, and then I tie the two together. We have a scheduling tool that interfaces with SAP. Like the voice of the dispatcher that comes on the police radio band and says it’s time for your shift to be over. Anyway, it has a database tier, and an application tier, and various agents and customers. For the past couple years (much longer than I’d like to admit), we would have periodic time drifts. I don’t mean the kind where you’ve worked a double shift, and your cruising down the highway in your Crown Victoria, startling to the horns of oncoming traffic jarring you awake, I’m talking about milliseconds of time added or subtracted per day. Wait, you might be saying, there’s no quantum mechanics or relativity allowed in the ABAP world! Hang on, we’re getting there.

What I observed was that one of the two systems used by this application had the wrong time. Looking at various logs (I keep scrupulous records, as any good ABAP or code detective would), I found the times diverting by a second or so per month.  No big deal, right? And in most cases, if a system were bounced monthly (perhaps to apply bandages to bleeding code injuries), that time difference would probably vanish. But these systems stay running for months at a time, no breaks.  All along, I suspected the OS, where the network time protocol (NTP) was misbehaving. I’ll go more into that social deviance after a short digression about system time synchronization.

Ignoring daylight savings time, or other time zone changes, it’s good practice to use a reference clock to keep your many systems on the proper time. In the UNIX world (and most other operating systems can use this, or a variation of it), a demon runs on the system, checking with one or more other systems what time they have. Assuming you’ve done your homework, you’d have some internal systems that are the primary reference for the remainder, and external systems that those check.  A few reference sites:

Briefly (way too briefly), there are reference machines one can (and should) use to keep your systems in line. Failure to do so will have dire consequences, as we relate shortly. In the meantime, the main clue I had was a periodic time drift, indicating a synchronization failure. I assumed the ntp daemon (service) was corrupt. But it wasn’t.  Before asking for debugging to be turned on (which would reset the system clocks, causing the evidence to once again be hidden for weeks), I read some manual pages (gasp, I know), and ran some diagnostics.

Diagnostics

$ ntpdate -d time01.internal

12 Jun 09:23:43 ntpdate[20709430]: 3.4y

transmit(…)

delay 0.04185, dispersion 0.00038

offset 2.448537

12 Jun 09:23:43 ntpdate[20709430]: step time server 4.3.2.1 offset 2.448537

.

Log 1

The “-d” flag says to show what adjustments would be made, if permitted. This showed a 2.4 second (approximately) gap. While we’re not dealing with atomic clocks here, and the thinnest of blue lines, anything over 1 second is bad.

$ ntpdate -d time02.internal

12 Jun 09:24:55 ntpdate[31981758]: 3.4y

transmit(…)

delay 0.04184, dispersion 0.00043

offset 0.276432

12 Jun 09:24:55 ntpdate[31981758]: adjust time server 4.3.2.5 offset 0.276432

.

Log 2

This says the other time reference site, if we relied on it, is around one-quarter second difference from the system I ran the command on.  In other words, the two reference sites are out of synch with each other.  As Mark Knopfler sang, “one of them must be wrong.”

What about external reference clocks?

$ ntpdate -d 2.north-america.pool.ntp.org   

12 Jun 09:25:31 ntpdate[40042622]: 3.4y

transmit(207.32.191.59)

delay 0.04903, dispersion 0.00076

offset 2.504520

12 Jun 09:25:31 ntpdate[40042622]: step time server 207.32.191.59 offset 2.504520

.

Log 3

Also wrong by over 2 seconds.  With several other reference sites (from the NTP pools), I determined that we had a crooked timekeeper, right in our ranks.

Back to ABAP. What happens when the database and application server don’t agree on what time it is?  Well, this, for one:

Date Time Message text Message class Message no. Message type
06/12/2012 22:51:15 Job started 00 516 S
06/12/2012 22:51:16 Step 001 started (program [], variant [], user ID []) 00 550 S
06/12/2012 22:51:25 ... file is being processed
BA 238 S
06/12/2012 22:55:02 ABAP/4 processor: ZDATE_ILLEGAL_LOCTIME 00 671 A
06/12/2012 22:55:02 Job cancelled 00 518 A

Table 1

I am showing similar errors in text, and in screen shot mode, to help the search engines, and to make the point. Other indications, as hinted above, don’t show what caused the code 16. This shows “illegal local time”.  The typical root causes are incorrect daylight savings time settings (one app server has one rule or setting, the other has another set), or one or more systems just aren’t running a time service.  But we were!  Why were we out of synch?

/wp-content/uploads/2012/06/illegal_time_error_111439.png

Figure 1

What led me to look for the above was an message from years back, that is still in the ABAP systems: “big time diff”.

Dev messages

$ grep -il “big time” dev*

dev_w3.old

dev_w51

dev_w52

dev_w53

dev_w53.old

[…]

M Tue Jun 12 04:13:37 2012

M  *** WARNING => big time diff 2265953896 [1396155766 3662109662]for stamp STAT_READSEQ, correct ???

Log 4

I found these on a few application server traces, but then, I knew where to look.  Often, what is seen in the trace files has a different interpretation in the job log.

Why were we out of synch?

Unfortunately, I can’t reveal my sources, but I’ll share a few lessons learned.

  • Don’t use just 2 reference sites. Use 3.
  • Use the NTP pools as reference.
  • Make sure your reference sites exist.

For further reading, a few SCN forum posts, and a slightly interesting blog, more about daylight savings time than reference. Oh, and SAP Notes.  Better to read them before finding your watch has stopped.

SAPonLinuxNotes Linux reports “big time diff XXX for stamp YYY”
http://scn.sap.com/thread/1198057

ORA-1555 occurred…

*** WARNING => big time diff

http://scn.sap.com/thread/739725

N4S Test Drive under SuSe 10.3 …

*** WARNING => big time diff

http://scn.sap.com/thread/1632847

Workprocesses stops and lock accumulates …

*** WARNING => big time diff

http://scn.sap.com/thread/1579656

BW System crashed due to DB6_ADM_RUNSTATS…

*** WARNING => big time diff

Daylight Saving Time and Slowing Down The Time Daylight Saving Time and Slowing Down The Time

447839 ZDATE_ILLEGAL_LOCTIME
481835 Analyzing the time zone settings
606374 “WARNING: big time diff” in wp trace files on AIX
679615 “big time diff” message in developer trace

Time to punch out the time clock and call this a wrap sheet.

Assigned tags

      7 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Tom Van Doorslaer
      Tom Van Doorslaer

      I have no idea what you're talking about, but it makes a great read. 🙂

      The storytelling made me read your blog whilst imagining the voice of Bruce Willis in "Sin city" telling the story.

      Author's profile photo Dirk Wittenberg
      Dirk Wittenberg

      Hi Jim,

      great blog!

      The idea to write it as a detective story is awesome.

      And thanks for pointing at this topic - wasn't aware of it so far.

      Regards,

      Dirk

      Author's profile photo Uwe Fetzer
      Uwe Fetzer

      Hi Jim,

      Hope the next problem is coming soon 😉 (not really), I love to "listen" to your stories.

      Uwe

      Author's profile photo Former Member
      Former Member

      Hi Jim,

      Great reading.  Looking forward to the next "story".

      Jim

      Author's profile photo Jim Spath
      Jim Spath
      Blog Post Author

      The latest tale from the ABAP Detective is now online - The ABAP Detective Never Signals an Interrupt.

      Author's profile photo Vani Gaur
      Vani Gaur

      Can anyone help me on this topic.

      As I see that one of our btc job got stuck up with log as :

      *** WARNING => big time diff 2863807247 [ 1558679092 127519043 ] for stamp DB_OTHERS, correct ???

      AND there were ORACLE locks also which generated in the system for the job.

      Author's profile photo Manoj Somkuwar
      Manoj Somkuwar

      Hi Vani,

      I know it's been too long, but can you let me know how did you overcome the issue?

      Regards,

      Manoj Somkuwar