[Portions of this blog originally appeared on the ASUG.COM site]
Once the deals were done, the specifications nailed onto the purchase order, and the rough deployment timeline mapped out, we dove into the detailed planning of how to move systems without users noticing, other than “wow that report came back really fast!” I’ll skip over the initial testing set up and simply mention a few critical factors in our cutover:
- We chose 2 basic hardware models – the big cabinets with lots of CPUs, and the smaller 8-way machines. Allocating which software applications went where, and how much, was a several month debate. We know we can tune afterwards, but minimizing any user freeze-out is a basic goal.
- The systems have slightly different CPU speeds, so that any benchmarks might be 10-20 percent faster or slower, depending on where they run. Even though this sounds negligible, jobs finishing 10 minutes before, or 10 minutes after an expected time can be confusing to some clock-watching users.
- CPU is more virtual than memory. In this generation, we found that we can dynamically allocate, and the systems can virtually consume, CPU resources much more flexibly than memory. We chose not to try out the latest iterations that share memory among applications. Perhaps next year.
- The latest operating system would be deployed, except for the applications that aren’t certified. This turned out to be more of a headache than expected, as the OS differences are subtle, and not always predictable.
- Downtime on production needed to be minimal, but not to the point where we’d set up multiple node clusters and move systems as failover operations. We chose to install new operating system disks on the new hardware, unmount the old systems, and move the storage to the new mount disks. It’s actually quite easy. We did add a few wrinkles, as again, this is a rare opportunity for performance, administration, and disk layout improvements.
During the cutover weekend, planned nearly a year in advance, Baltimore suffered through a historic snowstorm that dumped 2 feet or more in under 24 hours. Despite the storm warnings, the alternative of keeping two sets of hardware on the books for an indeterminate time, negotiating another set of outages and losing momentum, the teams moved the largest of our SAP systems. And when I say “the teams” I’m not really included. Once the planning and configuring is complete, my role is more of an observer, sitting on the bench as the A team executes the plan. And truly, it went as flawlessly as it possibly could. Within 8 hours or so for each system, users were back online working. Previous hardware changes of this magnitude would take 24 to 48 on up to 72 hours, leaving the teams exhausted and sloppy by the end.
During the operating system version upgrade that occurred along with the hardware upgrade, we had the opportunity, and the requirement, to evaluate all third party bolt-ons, or tool sets, to verify compatibility. One example is backup software, which we had been transitioning for some time, and completed on the new hardware. It gives a clean break as well as an incentive point when you can put this kind of stake in the ground (“no old software on the new machine”). It doesn’t always work out that way, for various reasons, but it is a good approach.
Generally, we did not find anything that did not work correctly on the new hardware or OS. We upgraded one freeware tool that provides monitoring system views, as the newer version works better with the IBM virtualization layer (Power VM).
In the old world, when I sized an SAP landscape, I and the hardware vendors (as well as the SAP QuickSizer tool) used a common unit, the “SAP”, in order to determine how much hardware is required. If production needed 5,000 SAPs, Quality 3,000 and Development 2,000, we needed 10,000 SAPs in total. In the new world, we probably still need the 5,000 for production (neglecting other systems for simplicity), but if the other 2 systems share the production frame is the total requirement now 8,000, or maybe 7,000, or perhaps even 6,000? That’s a topic for another day.
Putting aside the questions of how much hardware, what kind, and where applications run, deciding how much resources each application is given is yet another question we never faced when systems were physically separate. Our project was made more difficult because we switched CPU generations as well, so that predictions of combined loads from the prior generation were rather speculative. We chose to start with a level playing field, limiting increases or decreases in power relative to the entire landscape as much as possible. This wasn’t easy, nor was it completely successful. I’ll talk more about that in the next post.
A comment to the prior post asked about provisioning for quality and development, where teams might be given full scale copies more easily than ever for testing purposes, where demands increase because of the ease of deployment, where the shifting sands of underlying silicon may give false impressions about the stability and predictability of the new virtual enterprise. We are in the early stages yet, so I’ll have more to say once we’ve moved a few systems around for load balancing, and for other purposes. I’m happy to report that we didn’t massively undersize or oversize our new home. Systems are never big enough, though, and they are never cheap enough.
Not coincidentally, the picture below was taken a few days before our hardware move, as SAP recognized our data center excellence with our second (or is it our third?) “Customer Center of Expertise” certificate. Good work, teams!
p.s. note the change in my company name in my bio.