Skip to Content
Author's profile photo Jim Spath

A Boat called Solution Manager

A couple months after setting up the latest attempt at  Central Performance History (CPH) monitoring agents, using a Solution Manager development server, I took a look to see what was running.  In the meantime, I still haven’t managed to get the Business Intelligence views working yet, though that’s a story for another time.  The “old school” CPH collection runs with CCMS agents on remote machines, RFC connections, and a few other moving parts.

 

Sadly, though I expected to see data from July through October, it seems there are gaps in the fossil record history.  The first chart below shows one of the collections I had set up on the local Solution Manager instance.  The volume of records is deceptive; it appears to include daily, hourly, quarter-hourly, and minute-by-minute data values, but several of these have stalled out earlier in October, for no apparent reason.

 

Further research revealed there were System Landscape Directory changes around October 6th, mainly to continue purging production Solution Manager data from the development instance that had been created as a copy.  What the relationship between CCMS agent collections and the SLD topology is escapes me.

 

 

Dialog Response Time – Solution Manager

image

After more digging into permissions, security, agent processes and other anything else I could think of, I found issues in the RZ23N area.  All of the connections showed up as “Shutdown” except one.  My recollection was at least a few of these had been working before.  So, what caused the dropout?  Still have no idea, as the accounts, passwords and agents seem to be intact.

 

Remote Agent Status

image

Clicking around randomly, I discovered that by triggering a connection test to the remote system, the status switched from “Shutdown” to “Online”.  That was an unexpected reaction, but as the effect was what I wanted, I figured the gift horse didn’t bear closer scrutiny.  I continued testing the remainder of the remote systems that had previously been working, and prepared to wait for further progress.

Reconnect Remote Systems

image

A few minutes later, I reviewed the system status screen (RZ20).  Lots of red lights, and no sign of recovery in progress.  Often, with the cyclic nature of these collection and storage processes, it is necessary to have patience and wait for developments.  If all is well, the lights will go green on their own.  If not, it’s back to the drawing board.

CCMSR Agent Status

image

Again, further stumbling around, as I decide that reviving the remote agents may work by installing additional collection schema.  I tried redefining an existing method, but that didn’t seem to work.  Below is an error message generated when trying to hit the same strike twice.  Since the past data was of no value, I continued.

 

The typo listed is “as built”, one of those pet peeves showing me a lack of quality control.  Or just not built by English native speakers.

 

Alerady An Assignment

image

As seen below, the remote system history is missing up to the point of the remote connection test reviving the repository loading.  A whole few minutes worth of data, though an inauspicious start, is a start.

Remote System Restart

image

I added “generic key hit ratio” to several systems where it was not being collected before.  Another way might have been to remove an existing collector and add it back in again.  This screen is one of the less friendly in the section, and that’s not saying much.

Redo Schemas

image

A bit later.  The remote system has no data for September, meaning the monthly rollup aggregation worked in July and August, but not since.  The daily collection is working as of this date.  Whether it would continue is a great question.

Note that this parameter was working in July when I stopped looking at it.  The only thing left intact was the 2 monthly records.  The daily collection started when I added a different parameter.

 

Partial Remote Restart

image

Here’s the new data element, being collected for one of the remote systems.  The hope is that, after a few days, week, or a month, more aggregation levels will be apparent.  Whether these other data would be required if the BW repository has the same content is yet another question I hope to answer shortly.  But I’ve said that before, I think.

New Collection

image

A different RZ20 screen than seen above, this shows the state of “minute” collections a couple hours after the successful restarts.

Collection Underway

image

 

All in all, I feel like I’m in a storm, far from port.  Life rings are hard to come by.

 

 

Links

Next steps

 

I’ll be wanting to move these out of the sandboxes, and on up to production, but am still stalled with the BI component.  We had a few revelations yesterday that may cause the logjam to become unstuck.  We’ll see.

 

 

And one more thing.

 

CPH and BI daymare

 

image

Assigned Tags

      1 Comment
      You must be Logged on to comment or reply to a post.
      Author's profile photo Leon Johnson
      Leon Johnson
      Jim,
         In the past I have seen the same effect as you with agents displaying an offline status.  This only happened on our sandbox Solman, never on the PRD Solman.  After some digging, I determined the only time the agents didn't report to the Solman system is when the Sandbox system ran dangerously low on memory.  Not sure why it would cause the agents on another system to go offline, but it would report that status.  The PRD box was sized correctly, never was low on memeory, and never had an agents suspiciously go offline.