Skip to Content

After our Business Warehouse upgrade to 7.0 in July 2009, I found that ST03 was paralyzingly slow to respond when drilling down to where process chain reports were displayed before the upgrade.  As the symptoms did not appear immediately after the upgrade, but a few days later, I presumed it was data volume related.  There was a lot I didn’t know at the time, though.

Previous chapters of this saga I related in blog posts starting 14-Aug-2009:

After the typical drawn-out struggle with support to identify the responsible code and/or data causing the observed symptoms, and an equally unsurprising wait to apply an unpublished SAP note, we began the path to production.  Sometimes fixes for issues such as this are simple enough that they may be applied to production in short order.  Others are complex enough, and the timing is such that we need to wait.

We had a production copyback near the end of August, part of our monthly production transport cycle.  Major changes are regression tested for a week with the latest data to ensure the most representative testing.  As relayed in the last prior chapter, my QA tests showed the proposed code fix repaired the problem. 

This past weekend, transports moved into production.  As it was Labor Day in the U.S., I did not begin verifying the correction until Tuesday.  To my surprise, the first drill down into ST03 took about five minutes before dumping.

 
Figure 1

image

This was a surprise. Previous faults had simply been a long delay, much longer than even the five minutes before this failure.   While it looked like the code died in about the same place, I wasn’t sure why.  The QA system had a huge volume of data, and should have had exactly the same code.

My next step was to verify that the code fix had gone into production (it had), and then to decide whether to update the SAP ticket, or look around a little more.  We chose the latter.

The BW architect reported:

… read thru the dump and was suspicious of one of the messages….so I reviewed all the data in the cubes and it seems like one of the loads failed over the weekend while I was on vacation.  So the virtual cube had to access about 4 days of data…

 

(“virtual cube” sounds like science fiction to me).

Next update was:

… corrected the failed load so the virtual cube only had to access today’s data and ST03 now runs in about a minute.

 

Later in the day I ran my own verifications. It would not make sense to simply assume everything worked fine, after the disconnects and wrong numbers on this case so far.

The symptoms were much improved  — I was able to see both the full month of August and the partial month of July (the first part now living in the “640” space).

 

Figure 2

 

image

 

Figure 2 above shows what I had been looking for – a quick summary of a month’s worth of process chain runs.  This view gives me the identity of the longest running applications, which on a month-by-month basis will show trends such as new contributors, and runtime degradation.

 

Figure 3

 

image

 

Drilling into one of the longest-running chains reveals more about the metrics.  You may notice that I sorted this display by start time, rather than by total (or component) time.  This gives me a clearer picture of the sequence of events.  As the individual parts don’t add up to anywhere near the whole, there’s a gap in the chronology.  Nearly 4 days in fact, as you see once you blot out the commas in the timestamp column.

It appears that a step failed, then the process chain lay dormant.  In this case, we’d be looking to our alert and escalation procedures more than the performance tuning that might have been implied by long running chain.

Are we done?  For this blog series, I expect so.  I can follow up via the commentary.  For this issue, no we’re not done.  Here’s where we are in stablizing the workload statistics objects in production:

… loads that fail and the data grows too large for the virtual cubes to work. Currently, I tried to reprocess the failed load and seems that it short dumps too. (compute overflow)  I found a note to apply, but that won’t be till next monthly move.  I’ll have to hand hold this load everyday now unless

respond on the ticket about the issue with the 14 day deletion not working and put back into SAP hands.

 

Part of the “postive call closure” survey will be links to this blog series.

To report this post you need to login first.

2 Comments

You must be Logged on to comment or reply to a post.

  1. John Skrabak
    I share your frustration with many of the “operational/adminstrative” tools that are provided.  I would love to hear from SAP to what extent they use ST03 internally at SAP to monitor process chains? Do they “eat their own cooking”?

    Prior to working with SAP BW, I managed a data warehouse for a large bank.  The vendor of the tools we used also managed data warehouses for smaller banks and credit unions as a service bureau offering using the same toolset. So they had their own service bureau support staff able to talk directly with their developers, and could present a problem, and show how it manifested itself. It didn’t take weeks of back and forth to get these kinds of problems resolved.

    (0) 
    1. Kenneth Murray
      I really appreciate candid replies like this when I read them!  That is the exact question I am always asking when I run into the many tool shortcomings across BW.
      (0) 

Leave a Reply