[Updated 03-Feb-2009 with “Positive Call Closure” grid – see end of blog]
In early November 2009, I posted a blog called “You have the CCMS that goes PING?” In that story, I related an issue we have had for a long time in the EarlyWatch reports generated by Solution Manager. The symptoms were that half of our production systems showed availability between 99 and 100 percent, while the other half showed up time between 49 and 50%. Clearly, something was fundamentally wrong.
SAP support’s initial response was to tell us we should be using CCMSPING to track availability, and that the “old” way was being desupported. The previous blog explains in detail how we deployed that tool. Unfortunately, while it was a great learning experience, the CCMSPING sweeps had zero effect on the EarlyWatch Report generated subsequently.
Given that it took a couple weeks for us to configure, test and deploy the fix recommended by SAP, not all of the ticket processing time is SAP support unresponsiveness. I opened the ticket on 27-Oct-2009, and by 30-Nov-2009, we knew CCMSPING was not the answer.
We tried to escalate the ticket, using our Enterprise Support status as a certified customer center. That’s supposed to mean we know what we’re doing, and should go straight to the HOV lane, bypassing the “what patches have you applied?” triage station. On 01-Dec-2009 the ticket went from “medium” to “high” priority, which appears to be the way to have someone more helpful look at it. The truth is the problem is minor, as we weren’t depending on the EWA report for any business service level agreements. It was just wrong data, and I wanted it fixed.
There was some back and forth during December, but with other project deadlines and a holiday period where many support personnel were unavailable, I didn’t push it. The ticket sat with SAP for several weeks (although they will probably argue that I didn’t write up the business justification for raising the ticket to high priority).
I pushed the question up through our technical support channel again in early January, which prompted a quick reply via phone from Ireland, with SAP support apologizing for the delay and promising to have a developer look at the issue. For more than a decade of dealing with SAP technical problems, the phrase “a developer will look at it” is the equivalent of hitting the lottery. Overnight, I received an update (“Customer Action Required”).
The update message said to look at table MONI_V01, as that is where EarlyWatch gets its data. The light dawned, finally. Why didn’t I get that answer within 24 hours the first time I asked?
A quick peek at the table showed rows dating back to 2006:
But the problem systems had this:
So there it was! Every other hour had data, instead of every hour. But why?
I looked at the TCOLL table, after a quick check with Basis to help me remember the name of this table with the “X”s and “O”s in it (like a football play book, or maybe a voting machine, or even tic-tac-toe – see The specified item was not found. and/or wiki). The”O”s are really blanks, but you get the point.
I tried to compare the above 2 tables, then realized that we don’t use them for every SAP internal maintenance process; we use our external enterprise job scheduler.
As the brain cells thawed, I found egg on my face, as I had set these jobs up, probably 10 years ago, and they were running every 2 hours in the systems that are showing 50% up time. I’m sure someone told me at the time that would be fine, as the prior time period would be filled in later. In any case, now that we know the root cause I quickly put in a change request (not through Solution Manager, but through our enterprise ticketing system) to have this error corrected.
We won’t see a full week’s report until the next cycle, but based on the behavior change in the MONI_V01 table (or view) I think the fix is in.
I’m just shaking my head that the problem was so simple, in hindsight.
UPDATE: SAP Notes
Fairly recent notes suggested by SAP concerning CCMSPING and service level reporting (SLR):
- 1332677 ST03N-based system availability in SLR no longer supported
- 944496 CCMS Ping and SL Reporting
- 872569 Central Performance History and SL Reporting
Slightly older notes related to EarylWatch, CCMS, MONI and TCOLL:
- 917558 – EarlyWatch Alert does not contain ST03 statistics as of 7.0
- 762696 – Grey rating for Earlywatch Alert
- 75059 – Mean values in workload monitor decrease
- 60588 – Collector logs are not deleted
The classic note on TCOLL:
- 12103 – Contents of the TCOLL table
“The SAP standard job SAP_COLLECTOR_FOR_PERFMONITOR must be scheduled hourly in each SAP system.”
The modern classic version:
- 1394392 – Contents of the table TCOLL in SAP_BASIS 730
The guts of it all:
- 16083 – Standard jobs, reorganization jobs
“Adhere to the recommendations, as the naming conventions enable us to check quickly and easily whether these jobs have been activated in your system.”
Below is a screen shot (cropped to fit into the size limits) of the Positive Call Closure Survey I took when closing the SAP ticket. As you may know, the ticket asked for feedback on the last person that worked your ticket, who is almost always the one that solved the problem. There is no systematic way to comment on the service quality of anyone else who handled the ticket, other than to complain about the time to solve the issue, which I did.
I’ll quote the text comments I added, which we’ve been told is how you should document anything the standard questions leave out. We shall see if this gets any attention. Or we won’t.
|The initial response to this ticket was pathetic, in the sense that SAP support answered a question I did not ask.
We spent weeks of time and many hours of internal support setting up CCMSPING, only to learn it had nothing to do with the issue.
I still have no idea if or when CCMSPING service level availability metrics will appear in Early Watch reports.
After we escalated this ticket *twice* it was finally sent to someone who answered the original problem report very simply and easily.