Skip to Content

I thought my watch had stopped. Or was running wild. It was happening again.  Just a few months after I cracked the case of the slipped time reference site in the case I called “Pssst, ABAP Detective, you got a second?“, I found systems with differing times. My first clue was a message saying there were 7 seconds missing from a job. This time (no pun intended), the fault wasn’t directly seen in an SAP system.  I started digging further.

In the last time drift episode, I found clues inside of SAP job logs that led me to review what sources of time synchronization were defined.  I couldn’t just ask the railroad conductor where he got his timetable from before he blew the train whistle, I needed to discover it for myself. The first pattern was Windows systems having one time, yet UNIX systems having another.  When I first checked, it was seven seconds; when I looked later, it was up to 10 seconds and climbing.  Not a lot in the grand scheme of things, but if some timestamp got whacky the shippers could lose packages, or something.

Here were the lessons from the last drift:

  • Don’t use just 2 reference sites. Use 3.
  • Use the NTP pools as reference.
  • Make sure your reference sites exist.

I could sense these didn’t solve everything. To keep this tale short (and maybe sweet), I found out the Windows systems get their time from Active Directory domain controllers.  That’s fine, if those controllers are straight.  If they’re crooked, well, the whole continent was drifting out to sea with them, climate change or not.  My first new clue was to use a little-known Window utility, since “ntpdate” isn’t part of the standard OS.

Rap sheet on remote reference system

$ w32tm /stripchart /computer:0.us.pool.ntp.org  /samples:5 /dataonly

Tracking 0.us.pool.ntp.org [64.95.243.61].

Collecting 5 samples.

The current time is 11/13/2012 4:50:01 PM (local time).

16:50:01, -10.5314154s

16:50:03, -10.5518846s

16:50:05, -10.5496261s

16:50:07, -10.5502981s

16:50:09, -10.5501962s

Rap sheet on domain controller

$ w32tm /stripchart /computer:domcol /samples:5 /dataonly

Tracking domcol [1.2.3.4].

Collecting 5 samples.

The current time is 11/13/2012 4:47:48 PM (local time).

16:47:48, -00.7965628s

16:47:50, -00.7883558s

16:47:52, -00.7879628s

16:47:54, -00.7953808s

16:47:56, -00.7793569s

  • MS information on w32tm.  “A tool used to diagnose problems occurring with Windows Time”

To make sure my head was on straight, my pocket watch was intact, and I wasn’t looking at things in a mirror, I also found a distribution of NTP client for Windows, which included the same “ntpdate” command I had used to run down the culprits in the last chase.

C:\Program Files\NTP\bin>.\ntpdate.exe  -d 2.north-america.pool.ntp.org

15 Nov 02:40:24 ntpdate.exe[8040]: ntpdate 4.2.6p5@1.2349-o Jul 30 11:53:32 (UTC

+02:00) 2012  (1)

15 Nov 02:40:24 ntpdate.exe[8040]: Raised to realtime priority class

transmit(50.115.174.206)

receive(50.115.174.206)

transmit(64.73.32.135)

transmit(69.50.219.51)

receive(64.73.32.135)

[…]

receive(207.5.137.133)

server 50.115.174.206, port 123

stratum 2, precision -21, leap 00, trust 000

refid [50.115.174.206], delay 0.10367, dispersion 0.00319

transmitted 4, in filter 4

reference time:    d44ed3b3.36942e53  Thu, Nov 15 2012  2:36:35.213

originate timestamp: d44ed493.c8d76175  Thu, Nov 15 2012  2:40:19.784

transmit timestamp:  d44ed49f.329708c5  Thu, Nov 15 2012  2:40:31.197

filter delay:  0.15057  0.11934  0.10367  0.11923

         0.00000  0.00000  0.00000  0.00000

filter offset: -11.4821 -11.4584 -11.4598 -11.4600

         0.000000 0.000000 0.000000 0.000000

delay 0.10367, dispersion 0.00319

offset -11.459886


Pretty much the same result, though now, a day or so later, the delta is over 11 seconds.  I’m sure some of you are saying what difference does a few seconds make?  In my world, a lot.  It’s not keeping me awake at night, but it’s my mission to set the servers straight.  And if their controllers are crooked, well, we can vote them out of office, or we can take them downtown.

References

Crude purple crayon drawings


Figure 1

NTP-ERP-1.png


Figure 2

NTP-ERP-2.png


Figure 3

NTP-ERP-3.png


Figure 4

NTP-ERP-4.png


Figure 5

NTP-ERP-5.png

To report this post you need to login first.

8 Comments

You must be Logged on to comment or reply to a post.

  1. Marilyn Pratt

    This is now on the screen of the community pod at sapteched2012  Madrid. Using the content to illustrate the humor and style that can be creatively employed to illustrate technical contents.   Love your film noir writing style.  Talk about taking a dry topic and making it come to life with humor cc Thorsten Franz

    (0) 
  2. Nishan D Singh

    No – SAP doesn´t have this kind of mechanism (at least I´m not aware of any).

    But they do have sap note 7417 – for Conversion between winter time and daylight saving time.

    SAP uses the time of the underlying operating system. If you keep that in sync with NTP Server it´s pretty fine, nothing needs to be done.

    Or If I got your question wrong ?

    (0) 
    1. Jim Spath Post author

      Nishan Dev S

      “If you keep that in sync with NTP Server it´s pretty fine … Or [have] I got your question wrong”

      Well, this blog wasn’t a question to my readers, since I’m aware of the root cause of the symptoms I have observed.

      Ponder this:  you have two (or more) SAP enterprise systems in different business units, or in different data centers. Each uses a different NTP server (or peer), as the rules generally say to use a time reference that is geographically close.  When you compare the times on those two systems, you find they don’t agree.  What would you do next?

      (0) 
      1. Nishan D Singh

        Jim,

        My Next step would be synchronization of both the unit with NTP Server. and found out which Unit is not in syn and also verify the security log of the unit/server which is not in syn with NTP Server and also verify server patches or wsus updates applied to them, all this can be collect using a utility program provided by microsoft.

        Thanks,

        Nishan Dev

        (0) 
        1. Jim Spath Post author

          Nishan:

            Not quite; “verify patches or wsus updates” sounds like business as usual.  This is different.

            I updated the blog with a few quick drawings (using MS Paint, no less).  Figure one shows a simple (and simplified) ERP system, with a central instance and a couple of application servers.  The app servers NTP configuration points at the central instance, which in turn points at an external time reference source.  Works as expected.

            Figure two shows an expanded ERP system, with a Windows application server, which is configured to get its time from a Windows AD domain controller (DC1).  As long as the Domain Controller points at an authoritative time source, all is still good.

            Figure three shows what could happen when a new Windows Domain Controller is added to the landscape (let’s say the old one went off lease).  Because we can’t just yank the old one, the new one gets a new name (DC2).  If we forget that AS3 was using DC1, it continues to get the correct time until we turn off DC1 and sell it at a flea market (or whatever happens to old servers).  Once it’s chain to functional sources is cut, it’s like a ship without an anchor and starts drifting.

            Figure four shows two sites.  Since they may be in separate countries or continents, their sources might be different, but if the sources are accurate, there is no discrepancy.

            Figure five shows what happens if the central instance in site two is not configured to point at an external site.  It will be set adrift starting at whatever time was set manually.

          Jim

          (0) 
          1. Nishan D Singh

            Jim,

            Illustration 1 —> 3 Application server , 1 Central Instance, and CI is updated from NTP no issue in time sys.

            Illustration 2 —> 3 application server, 2 networked to CI and CI is networked to Domain controllor and AS3 is also connected via DC1 to CI = so whole network is in syn and DC sysn with NTP Pool, No issue.

            Illustration 3 —> similar scenario but DC is changed, again here we need to note some point a) Firstly if network and security is driver by Domain Controllor,  you cannot, remove that DC from network without properly DC migration, if this happen whole network will get hammpered and Application services may might still run, but if network is broken how other things will work ?

            Illustration 4 —> similar scenario but no DC so CI is getting it’s update from NTP so no issue, this are standalone scenario.

            Ilustration 5 —> Site 1 will be syn  to NTP and Site 2 has nothing it will run on mercy of adminstrator, so this structure is not well managed.

            I hope we are on the same page ?

            Thanks,

            NIshan Dev

            (0) 
    2. Jim Spath Post author

      Nishan Dev S – (I’m breaking the reply chain since each add-on gets narrower and narrower in this Jive version).

      “I hope we are on the same page?”

      Good question.

      First, in revealing several possible breakages, I ignored the fact that the detective doesn’t know what’s broken, only that there are breakage symptoms.  Not everyone can or should know how these system are configured.

      Second, I oversimplified the environment by showing the domain controllers connected to external sources.  It’s possible that the domain controllers point at the network routers or switches, which in turn point externally. The same fault occurs on hardware turn-down.

      Your comment about “proper DC migration” is unclear to me.  It’s possible (in my experience) that NTP configurations are defined to systems that are later decommissioned.  At one point, we tried to mitigate this risk by using DNS aliases, but if two hardware generations later, the current inhabitants don’t know this was the plan, the drift will be back.

      A few more references (it was hard to find a network device link that wasn’t about Cisco).

      http://www.firewall.cx/cisco-technical-knowledgebase/cisco-routers/334-cisco-router-ntp.html

      http://www.cisco.com/en/US/docs/ios/12_2/configfun/command/reference/frf012.html#wp1123799

      http://www.h3c.com

      (0) 
      1. Nishan D Singh

        Jim,

        I did got opportunity in my past experience to discussion similar kind of issue while preparing blue print for a bigger enterprises and to work along side with some network architecture and domain controllor, so pointing to network devices such as router, switches for NTP issue, possible it could, but, then, the whole network would be have a wrong time stamps, but that not the case on which we are working. We are trying to narrow down the issue in which 2 separate Data Center, where SAP Instance are installed and both of the Data center, time stamps are not in syns ?  Possible both of us know the reason also.

        Proper DC migration – When I said proper DC migration, I mean to say, domain controller migration, and if SAP Instance are installed on domain controllor and whole network run on domain, then logically all the 500 or more system connected to network are exchanging information based on secure domian network, so it mean all of them have trust relationship which cannot be broken just by simply, plugging off the Domain controller, it need to be properly challeneled for migration activity, it mean, second domain need to be build up first, it need to be introduced to the primary domain and then the ownership of the tree and its node will be given to the new guy, after that you can disconnect the old domain.

        Thanks,

        Nishan Dev

        (0) 

Leave a Reply