Skip to Content
Author's profile photo Former Member

The Perfect Storm, Part 1

 Maybe you, too, have fond memories of a board game called Clue. If you’re not familiar with it, each player chooses a token- perhaps the lead pipe, the candlestick, or the rope- to represent himself/ herself while moving around the game board, seeking clues to solve a murder mystery, Eventually, a player declares something like, “It was Professor Plum in the library with the lead pipe,” and if the guesser is correct, the game is won.

My sisters and I loved that game, and these days I think of it often. Unfortunately the mystery I have been working on is not so easily solved. What has caused the poor performance of one of our systems in the ERP landscape? The suspects are many: the upgrade back in 2008, which solved some performance issues but seemed to result in new ones; any of the myriad of hot fixes and custom patches installed since then; the new database design that came with the upgrade which resulted in a seemingly uncontrolled proliferation of tables; the web dynpro user interface on our SAP portal; the custom integration middleware, coded by a third party, code which we did not have a copy of; something in the network connections between the connected SAP systems, the app server and the database server; some peculiarity of our use case, our security role design, and our global 24/7 security support organization. 

After that 2008 upgrade, system performance gradually deteriorated to the point that, last October, the system essentially stopped running, apparently due to the database growth. We discovered, much to our dismay, that the solution had no archiving functionality; the vendor’s product engineers gave us some guidance on deleting thousands of tables that were never needed in the first place, and eventually we cobbled together our own way to archive records and wrote stored procedures to access them.

We limped along for months, working daily with a crew from the vendor’s technical support group, never really satisfied with the performance, despite increasing time and resources spent on the system’s care and feeding. The straw that broke the camel’s back came in April –  our discovery that the solution had no functionality for a smooth adjustment and seamless continuity in a high availability SAP landscape after a server failover. After a week of downtime, we made the best of a bad situation; we took down the problem application, moved the remaining users over to a development system, and took down the production system completely.

Our plan was to back up the servers, uninstall everything- all the apps, hot fixes and patches- create a new database and reinstall everything but the problem application. Fortunately we have a manual workaround, and we will survive without it. The rebuilt production system is back up and being tested. The next step is to upgrade to a newer release that seems to offer improved performance.

So if I’ve seemed a bit preoccupied or out of touch recently, this is what I’ve been dealing with the past six months. Somewhere along the way my system configuration and compliance job devolved into a hybrid of an Intensive Care Unit nurse and an air traffic controller, coordinating efforts of a large and no doubt weary team including many on the vendor’s technical support team and product engineers, my server support folks, my DBA, my in house developer, my SAP security team mates, and my key users.

I wish i could say that we have identified the root cause of all this frustration and inconvenience to our users, but I am afraid that we may never know. We have no plans to reinstall the application that seemed to be most problematic, and with a clean database and new release for the rest of it, we are hopeful that we are on the road to recovery. The unsolved mystery may nag at me for a long time, but moving on may be our best option. Options for the longer term resolution are still under consideration, but I certainly hope that we end up with a simpler architecture, with fewer moving parts at risk for failures.

Assigned Tags

      3 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Jim Spath
      Jim Spath
      I think Michael Krigsman would be good to talk to about project disasters like this.  But of course, no one likes bad press (other than realists), so you probably aren't authorized.  He can probably listen anyway.

      Jim

      Author's profile photo Former Member
      Former Member
      Situations like this are difficult precisely it's never clear whether or not to continue fixing the problem or simply start over and reinstall.

      In this case, your comments suggest that was impossible to isolate the cause because of the large number of variables.

      With the 20/20 vision of hindsight, can you suggest steps that could have prevented this situation from arising in the first place? I would be very interested in learning from your thoughts and experience.

      Author's profile photo Former Member
      Former Member
      Blog Post Author
      Michael,

      I have been wracking my brain over that very question. Honestly, anything that we would have done differently would likely have resulted in trading one set of problems for another.

      For example, eliminating the web dynpro user interface that required the custom integration code would have eliminated some of the complexity and the problems, but that UI was actually very well received, with a lot of reporting and intelligence built into it, so we'd have traded one set of issues for another. Our ERP landscape is highly customized; using tools as they come "out of the box" is not often done.  We could probably do better at process change management and process standardization.

      Part of our challenge was that the executive sponsor of the project retired mid-project, and losing his executive support was a blow to our efforts. With stronger top-down support to champion this initiative across the business, we might have been able to make do with less customization and less complexity, even if it meant more process changes and the attendant training required. Perhaps when we lost our sponsor, we should have raised the white flag until we could muster more executive and financial support.

      Gretchen