This past weekend, we upgraded our SCM system from 4.1 to 7.0. And when I say “we” I’m not really counting myself as my role is longer term, helping to provide the infrastructure to get it done, and taking a look post-upgrade to make sure the system stays stable and productive. I was able to enjoy nature while the application teams and operations teams executed the upgrade. I’ll show some before and after pictures and discuss the mechanics a bit.
As the title implies, we skipped SCM 5.x. At one point in the planning, we were told that we needed to do two upgrades, first from 4.1 to 5, then from 5 to 7. This would have implied longer downtime, more application team testing, and possibly more hardware. After more research (push back) we found it was feasible to go straight to SCM 7. And that’s about the extent of my knowledge of the versions. You can learn the whole story from the people on our sales and operations team that managed the project, at the ASUG Annual Conference (co-located with SAP Sapphire in Orland); Session 1412: “How Black & Decker Streamlined Its SAP Supply Chain Management upgrade”. Eliseo Rosiles and Patricia Pichardo will present.
I’m skipping over the long months of logistics planning, the landscape preparation and upgrades, and zooming right into the production changes. As you may know, there are preparation phases that can be done while the system is up. Based on results of testing, we started this in production 2 weeks prior to the upgrade weekend. The first chart below starts on Saturday April 10th, going through and past the upgrade weekend of April 25th and 26th.
In our virtualized system world, a number of applications run on the same hardware; you can see the SCM production DB/CI instance in the beginning of the 2 weeks as brownish spikes, using about 1 CPU fairly constantly for about a week, ending on Friday the 16th. The week after doesn’t show much CPU until the upgrade starts. As this is a 14-way server, there is a lot of overhead in this condensed view. But that can happen when you average over longer terms. Let’s zoom in closer.
The next chart shows the Friday. Saturday and Sunday of the upgrade weekend. There is a flurry of activity just before and after midnight on Friday, then a bit of a lull until around Noon Saturday, then several distinct spikes. This is where the majority of upgrade actions occur.
Finally, there are 2 more spikes late Sunday into early Monday. My comment would be that in general, SCM doesn’t use a lot of CPU horsepower ongoing, and it’s nice to have the capacity to provision more when needed for an upgrade or other special event. We’ll look at other parameters, such as memory and network, shortly.
App server CPU trend
Post upgrade, I compared several days to see whether the systems were using more or less CPU. We’re still shaking things out, and the first day after such an event is not always business as usual, but I think you can glimpse the trend below.
These are 24 hour CPU charts from one application server, before and after the upgrade. The red line simply says we give a half a CPU to start, but on each you can see spikes up to slightly over 1 CPU. We have several application servers, plus Live Cache, and the database/central instance, meaning you’re only seeing part of the picture. Before, we typically had 2 spikes per day, corresponding to batch loads, and fairly low CPU use.
After, we’re seeing spikes at different times (we’ve changed how we do batch work)and the CPU pattern during typical work hours is higher. I could show other charts where the pattern is similar after the upgrade, and showing we have the capacity for larger spikes, but we’ll move on.
Another area I’d typically look at would be workload within SAP (not just CPU and memory). ST03 shows details, including the amount, timing, volume and other aspects of remote function calls. As software has progressed over the years since the monolithic SAP R/3 standalone systems, inter and intra system work has grown, in some cases dramatically, and in other cases less so. I wrote a blog a couple years ago describing how I measure this impact (“Measuring SCM ATP Workload Impact on R/3“). The naive view is that new systems don’t affect the old ones, but I’ve found that new code tends to require more power, on both the new and old sides.
Since we’re still in the post go-live tuning and bug killing (let’s not say fire fighting, okay?) mode, the system workload profiles may change. Or what I’m seeing now is the trend that will continue. I’m not certain; however, I know that this needs to be looked at, or surprises may happen during upcoming peak business periods. I’ll focus on RFC traffic, but understand that data transfers, system memory and other metrics are also being examined.
The number of RFC steps is a good base metric to start with, as are other steps, like dialog, background, update, etc. Since the averages depend on the number of steps, it’s important to know when they change. At first glance, I saw a huge spike Monday, compared to prior weeks.
Here are the prior weeks, with between 2 and 2.5 million RFC calls. That would be 300,000 or 400,000 per day.
After the upgrade, I saw a million the first day, and over 500,000 each day. I’ll review this over the next several weeks. My preliminary take is that the code has changed quite a bit, as our BW 7 system did as well, and that more work is done by forking additional processes. The calls drive increased requirements for memory, network connectivity, but more importantly, add complexity to debugging and tracing work.
One place I always look, during a routine investigation, is SM21, the system log. I like to keep a copy of how things look, so that I can compare before and after, particularly if multiple tuning or configuration changes are in flight.
When I tried to save this transaction as HTML (to get a nice repeatable view), I got an error message that is new to me:
ICON_WD_RADIO_BUTTON_EMPTY missing in BDS; will not be downloaded.
When I viewed the result, I found the red, yellow and green lights were working, but the clear light was empty (missing graphic file). That puzzled me, but not for long. I simply looked at the image property in the browser, did a google search for “s_wdvrbe.gif”, found a site that shares all those cute little icons with the world, downloaded it to the same directory as:
And we’re done. Maybe I’ll upload the HTML page as an image (later).
See you in Florida?
[Slight edits in 2014 for site refresh]