[Portions of this blog originally appeared on the ASUG.COM site]
Before going further in this chapter, I wanted to thank several people who guided me through this major enterprise project. I’m sure I’ll leave someone out, but if I do, thank you too.
Glenn Miller and Lou Lamprinakos at IBM helped with understanding the new virtual technology, and tried to teach me new phrases, such as entitlement, that I’m not quite sure I have fully digested.
Irene Hopf and Walter Orb of the IBM SAP Competence Center in Germany, who helped interpret not only what new hardware features will benefit us, but gave insight into the whys and wherefores of SAP notes, IBM whitepapers, and peer customer success stories.
Bill Adams of SAP, and his Benchmarking, Performance and Scalability team, for keeping me straight on the latest QuickSizer developments, SAPs ratings and hardware realities.
Not to mention our Midrange, Database, Storage and Basis teams, who listened to me, asked questions, and generally took my advice. Without their trust and understanding of the vagaries of capacity planning and system sizing, we probably would have spent a lot more time (and money) to get where we are.
Any glitches in the hardware moves were extremely minor, typically related to leaving a one-off script on the old system, and easily fixed by restoring from backup. From what I saw in the first full business day after cutover, everything is working great, and we probably won’t make any capacity changes right away.
Viewing system performance trending over the longer term is my next task. I always look to make sure that the data collection schemes (like saposcol) are running soon after any new systems come online, as having no data makes incident management problematic, and renders any significant tuning decisions into mere guesswork.
The Plan Was
Over more than a year, we studied and calibrated our previous hardware, where we had a little virtualization, but had separated production and development. During this time, I collected CPU, memory, and network usage to do “what-if” scenarios. Here are charts on several permutations.
The first image shows several systems superimposed (or super-opposed?), in other words, combined in theory. The “5200” figure represents the 52 separate CPUs from all of the systems this represents. The expectation was that we could cut this in half by virtualization.
The second view is similar, though with a different time resolution. Here, we’ve zoomed in on the peak found above to make sure we understood which systems were the busiest. As you can see, a lot of activity happened just after 9:00 AM.
Here’s a different time period(different day) with a couple servers that show “flat-line” meaning they had exhausted their hard-limit CPUs. The goal of virtualization is to design around this constraint, putting such workload where it can get more CPU as needed. The secondary goal, of course, is to not starve other critical business processes from compute power, particularly if the CPU-limited work is not as time-critical.
The last image from 2009 shows one of the scenarios I proposed, splitting systems into different hardware for load balancing and disaster tolerance.
A few things changed between the time the plan was made and executed. Fortunately, nothing major that would have caused major headache or rethink. But certainly a few wrinkles that has us scratching our heads.
Each of these charts is from a week or so in February, after the initial first tuning (and data collection verification)
This one seems to confirm the predictions, though you might note we included more systems than the modeling shows.
This is an 8-way box, running at least 8 systems, without hitting half of the installed CPUs.
The last is another larger server, with more than a dozen partitions up and running. It should show as less loaded than the model, based on CPU differences.
I’ve skipped over comparisons of dialog response time and batch run time. Most of these show improvement, though not all, and I’m still investigating where those predictions don’t match the truth. However, we’ve moved well over a hundred running systems to the virtual world, and expect to provision a lot more capacity in this manner over the next years.