I am happy and excited to right my first ever blog on this community. I have taken so much from this great community while resolving many issues reading many wonderful blogs. Its time to return back something to help others. I am writing about an experience, which can be described as metric explosion as per CA.
How it started ?
Few days back our network team came up with information that network utilization between two servers (one was hosting MII application and other was hosting wily EM) is very very high and resulting into packet loss. Same day we got some alert for high memory utilization for MII application.
What was the Error?
While checking the Wily Introscope Agent logs (location /usr/sap/<SID>/SMD<XX>/SMDAgent/temp/IntroscopeAgent.<SID>_<Instance>_server<x>), I found below exception
[IntroscopeAgent] IntervalHeartbeat.execute threw executing: Remove Metric Data
I noticed that before this exception , number of current live metrics were in range of 1000-14000, but after the exception number of live metrics started to grow up and reached above one million. (in the same file you can search for logs like below)
11/09/15 02:19:03 PM GMT [INFO] [IntroscopeAgent] Number of current, live metrics=1012
12/11/15 03:02:28 PM GMT [INFO] [IntroscopeAgent] Number of current, live metrics=1035158.
It was clear to me that this issue is causing high memory and network utilization.
On the below mentioned link (From CA support), I was able to find more information about interface mentioned in the exception and what is a metric explosion:
How it was fixed?
To fixed the problem we restarted the sap java application and it resolved high network and memory utilization.
As of now I am not aware of the aware of the permanent solution but I am working on the same. I know there is a memory leak issue with ISAgent version 9.1.0.X but we were using ISAgent 8 when this issue occurred. Will update my blog when I will be having a permanent resolution for the same.
Thanks and Regards