Hey, ABAP Detective, you got 10 seconds?
I thought my watch had stopped. Or was running wild. It was happening again. Just a few months after I cracked the case of the slipped time reference site in the case I called “Pssst, ABAP Detective, you got a second?“, I found systems with differing times. My first clue was a message saying there were 7 seconds missing from a job. This time (no pun intended), the fault wasn’t directly seen in an SAP system. I started digging further.
In the last time drift episode, I found clues inside of SAP job logs that led me to review what sources of time synchronization were defined. I couldn’t just ask the railroad conductor where he got his timetable from before he blew the train whistle, I needed to discover it for myself. The first pattern was Windows systems having one time, yet UNIX systems having another. When I first checked, it was seven seconds; when I looked later, it was up to 10 seconds and climbing. Not a lot in the grand scheme of things, but if some timestamp got whacky the shippers could lose packages, or something.
Here were the lessons from the last drift:
- Don’t use just 2 reference sites. Use 3.
- Use the NTP pools as reference.
- Make sure your reference sites exist.
I could sense these didn’t solve everything. To keep this tale short (and maybe sweet), I found out the Windows systems get their time from Active Directory domain controllers. That’s fine, if those controllers are straight. If they’re crooked, well, the whole continent was drifting out to sea with them, climate change or not. My first new clue was to use a little-known Window utility, since “ntpdate” isn’t part of the standard OS.
Rap sheet on remote reference system
$ w32tm /stripchart /computer:0.us.pool.ntp.org /samples:5 /dataonly Tracking 0.us.pool.ntp.org [64.95.243.61]. Collecting 5 samples. The current time is 11/13/2012 4:50:01 PM (local time). 16:50:01, -10.5314154s 16:50:03, -10.5518846s 16:50:05, -10.5496261s 16:50:07, -10.5502981s 16:50:09, -10.5501962s |
Rap sheet on domain controller
$ w32tm /stripchart /computer:domcol /samples:5 /dataonly Tracking domcol [1.2.3.4]. Collecting 5 samples. The current time is 11/13/2012 4:47:48 PM (local time). 16:47:48, -00.7965628s 16:47:50, -00.7883558s 16:47:52, -00.7879628s 16:47:54, -00.7953808s 16:47:56, -00.7793569s |
- MS information on w32tm. “A tool used to diagnose problems occurring with Windows Time”
To make sure my head was on straight, my pocket watch was intact, and I wasn’t looking at things in a mirror, I also found a distribution of NTP client for Windows, which included the same “ntpdate” command I had used to run down the culprits in the last chase.
C:\Program Files\NTP\bin>.\ntpdate.exe -d 2.north-america.pool.ntp.org 15 Nov 02:40:24 ntpdate.exe[8040]: ntpdate 4.2.6p5@1.2349-o Jul 30 11:53:32 (UTC +02:00) 2012 (1) 15 Nov 02:40:24 ntpdate.exe[8040]: Raised to realtime priority class transmit(50.115.174.206) receive(50.115.174.206) transmit(64.73.32.135) transmit(69.50.219.51) receive(64.73.32.135) […] receive(207.5.137.133) server 50.115.174.206, port 123 stratum 2, precision -21, leap 00, trust 000 refid [50.115.174.206], delay 0.10367, dispersion 0.00319 transmitted 4, in filter 4 reference time: d44ed3b3.36942e53 Thu, Nov 15 2012 2:36:35.213 originate timestamp: d44ed493.c8d76175 Thu, Nov 15 2012 2:40:19.784 transmit timestamp: d44ed49f.329708c5 Thu, Nov 15 2012 2:40:31.197 filter delay: 0.15057 0.11934 0.10367 0.11923 0.00000 0.00000 0.00000 0.00000 filter offset: -11.4821 -11.4584 -11.4598 -11.4600 0.000000 0.000000 0.000000 0.000000 delay 0.10367, dispersion 0.00319 offset -11.459886 |
Pretty much the same result, though now, a day or so later, the delta is over 11 seconds. I’m sure some of you are saying what difference does a few seconds make? In my world, a lot. It’s not keeping me awake at night, but it’s my mission to set the servers straight. And if their controllers are crooked, well, we can vote them out of office, or we can take them downtown.
References
- NTP Client For Windows (“NT”): http://www.meinberg.de/english/sw/ntp.htm
- General NTP (Network Time Protocol) Server and Client information: http://www.eecis.udel.edu/~mills/ntp
- NTP and Windows details: http://www.eecis.udel.edu/~mills/ntp/html/hints/winnt.html
- “Time synchronization may not succeed when you try to synchronize with a non-Windows NTP server in Windows Server 2003” Microsoft Knowledge base: 875424
This is now on the screen of the community pod at sapteched2012 Madrid. Using the content to illustrate the humor and style that can be creatively employed to illustrate technical contents. Love your film noir writing style. Talk about taking a dry topic and making it come to life with humor cc Thorsten Franz
No - SAP doesn´t have this kind of mechanism (at least I´m not aware of any).
But they do have sap note 7417 - for Conversion between winter time and daylight saving time.
SAP uses the time of the underlying operating system. If you keep that in sync with NTP Server it´s pretty fine, nothing needs to be done.
Or If I got your question wrong ?
Nishan Dev S -
"If you keep that in sync with NTP Server it´s pretty fine ... Or [have] I got your question wrong"
Well, this blog wasn't a question to my readers, since I'm aware of the root cause of the symptoms I have observed.
Ponder this: you have two (or more) SAP enterprise systems in different business units, or in different data centers. Each uses a different NTP server (or peer), as the rules generally say to use a time reference that is geographically close. When you compare the times on those two systems, you find they don't agree. What would you do next?
Jim,
My Next step would be synchronization of both the unit with NTP Server. and found out which Unit is not in syn and also verify the security log of the unit/server which is not in syn with NTP Server and also verify server patches or wsus updates applied to them, all this can be collect using a utility program provided by microsoft.
Thanks,
Nishan Dev
Nishan:
Not quite; "verify patches or wsus updates" sounds like business as usual. This is different.
I updated the blog with a few quick drawings (using MS Paint, no less). Figure one shows a simple (and simplified) ERP system, with a central instance and a couple of application servers. The app servers NTP configuration points at the central instance, which in turn points at an external time reference source. Works as expected.
Figure two shows an expanded ERP system, with a Windows application server, which is configured to get its time from a Windows AD domain controller (DC1). As long as the Domain Controller points at an authoritative time source, all is still good.
Figure three shows what could happen when a new Windows Domain Controller is added to the landscape (let's say the old one went off lease). Because we can't just yank the old one, the new one gets a new name (DC2). If we forget that AS3 was using DC1, it continues to get the correct time until we turn off DC1 and sell it at a flea market (or whatever happens to old servers). Once it's chain to functional sources is cut, it's like a ship without an anchor and starts drifting.
Figure four shows two sites. Since they may be in separate countries or continents, their sources might be different, but if the sources are accurate, there is no discrepancy.
Figure five shows what happens if the central instance in site two is not configured to point at an external site. It will be set adrift starting at whatever time was set manually.
Jim
Jim,
Illustration 1 ---> 3 Application server , 1 Central Instance, and CI is updated from NTP no issue in time sys.
Illustration 2 ---> 3 application server, 2 networked to CI and CI is networked to Domain controllor and AS3 is also connected via DC1 to CI = so whole network is in syn and DC sysn with NTP Pool, No issue.
Illustration 3 ---> similar scenario but DC is changed, again here we need to note some point a) Firstly if network and security is driver by Domain Controllor, you cannot, remove that DC from network without properly DC migration, if this happen whole network will get hammpered and Application services may might still run, but if network is broken how other things will work ?
Illustration 4 ---> similar scenario but no DC so CI is getting it's update from NTP so no issue, this are standalone scenario.
Ilustration 5 ---> Site 1 will be syn to NTP and Site 2 has nothing it will run on mercy of adminstrator, so this structure is not well managed.
I hope we are on the same page ?
Thanks,
NIshan Dev
Nishan Dev S - (I'm breaking the reply chain since each add-on gets narrower and narrower in this Jive version).
"I hope we are on the same page?"
Good question.
First, in revealing several possible breakages, I ignored the fact that the detective doesn't know what's broken, only that there are breakage symptoms. Not everyone can or should know how these system are configured.
Second, I oversimplified the environment by showing the domain controllers connected to external sources. It's possible that the domain controllers point at the network routers or switches, which in turn point externally. The same fault occurs on hardware turn-down.
Your comment about "proper DC migration" is unclear to me. It's possible (in my experience) that NTP configurations are defined to systems that are later decommissioned. At one point, we tried to mitigate this risk by using DNS aliases, but if two hardware generations later, the current inhabitants don't know this was the plan, the drift will be back.
A few more references (it was hard to find a network device link that wasn't about Cisco).
http://www.firewall.cx/cisco-technical-knowledgebase/cisco-routers/334-cisco-router-ntp.html
http://www.cisco.com/en/US/docs/ios/12_2/configfun/command/reference/frf012.html#wp1123799
www.h3c.com ...
Jim,
I did got opportunity in my past experience to discussion similar kind of issue while preparing blue print for a bigger enterprises and to work along side with some network architecture and domain controllor, so pointing to network devices such as router, switches for NTP issue, possible it could, but, then, the whole network would be have a wrong time stamps, but that not the case on which we are working. We are trying to narrow down the issue in which 2 separate Data Center, where SAP Instance are installed and both of the Data center, time stamps are not in syns ? Possible both of us know the reason also.
Proper DC migration - When I said proper DC migration, I mean to say, domain controller migration, and if SAP Instance are installed on domain controllor and whole network run on domain, then logically all the 500 or more system connected to network are exchanging information based on secure domian network, so it mean all of them have trust relationship which cannot be broken just by simply, plugging off the Domain controller, it need to be properly challeneled for migration activity, it mean, second domain need to be build up first, it need to be introduced to the primary domain and then the ownership of the tree and its node will be given to the new guy, after that you can disconnect the old domain.
Thanks,
Nishan Dev