Enhancing Solution Manager – second impressions
Last week, I wrote about Enhancing Solution Manager – initial reaction(s) where several previously working Early Watch analyses took a hike. Here are a few clarifications to this changes, and a short tour around the Wily Introscope panels that I found.
[… hit #1 …]
I had found an altered “active user count”. When I looked at my source system, in ST07, nothing looked very different.
Active users hovered between 20 and 30 for specific time slots.
Logged on users was 50 to 100 when I looked.
Re-reading the latest Early Watch report, I found what was different. The number of users logged on per week has not changed much; what changed was the classification among high, medium and low. Prior to Solution Manager Ehancement Package 1, we were recording about 850 low users and around 20 medium users. Now we are seeing 600 low users and 300 medium users. As you can see below, the total count doesn’t change.
It’s like driving across the Canadian border and your speedometer changes from MPH to KPH without warning.
[… Hit 2 Dialog Response Time …]
Although response time for dialog screens in BW is not a primary indicator, it is still potentially useful to show when something is amiss, or also to demonstrate the efficacy of a system change. Having transactions speed apparently drop from 8 seconds to 2 seconds is quite disconcerting if you aren’t expecting it, like finding an extra step on the stairs, or not finding one that was there yesterday.
When I looked back at a prior Early Watch report (generated from Solution Manager), I didn’t see the 8,000 milliseconds that the new reports said had occurred, at least looking at the generated charts.
Looking at the report text, however, I found the discrepancy.
|Type||Steps||Total response time|
|DIALOG + RFC||430,203||8183.6|
|Type||Steps||Total response time|
While prior graphs had shown dialog response time correctly, the charts were showing dialog added to RFC times. I probably registered this oddity years ago, and simply added it to the “Ignore this section of Early Watch reports” list. Unfortunately, the new reports are looking at the combined dialog+rfc date from prior to the patch, and dialog alone afterward. That doesn’t make a lot of sense.
Another focus I had was looking at the performance of the Solution Manager environment, as there were reports that it was “slow” compared to before EP1. While I have not gotten to the bottom of this dilemma yet (no smoking processes) I looked around at the OS level to get baseline data on CPU, memory, I/O and process use by the expanded Solution Manager install. I’ll detail more of the agents, daemons and gremlins another day.
This shows top processes with nmon, similar to top, topas and task manager on other operating systems. The highlighted areas are process names. Other than jlaunch using 4% of memory, nothing really jumps out as being a resource hog. Once we get Solution Manager monitoring itself, we’ll know more (we hope).
The remaining images are from the Root Cause Analysis part of Solution Manager. There isn’t much I can report on from these yet, other than to say it is partly working, and partly not working. My typical approach in learning tools like this is to view them when the system is quiet, to get a sense of the background noise, and then either run some code myself, or look at the screens when I know specific testing is occuring. It’s fairly useless to try to start using these when an emergency arises.
I kept seeing the wrong number of systems in the RCA panel (see Image # 3 above). Even logging in the next day, I still saw only 5 systems, yet we have connected more of them. Our main Solution Manager Basis manager showed me the “Refresh” button, on the far right bottom of the screen, next to the timestamp. After I clicked that button I found the latest set of agents. I don’t know about you, but I expect to get the latest news automatically.