Solved: SAP BI Platform - Monitoring Watch stuck on Danger...

boman_hwang · ‎02-11-2020

Hello everyone,

we are running a clustered SAP BI Platform environment on version 4.2 SP 07 Patch 1. We have set up monitoring and watches with automated email alerts on every server, which seemed to work okay so far.

Now a couple of days ago we had an issue on one the cluster nodes and the watch for the Central Management Service (CMS) went to a "Danger" alert level because one the metrics was hitting the danger treshold.

Turns out we had an errant process generate a huge amount of audit events for the CMS that it could not handle all at once and thus a lot of audit events got stuck in the queue. Which the metric correctly identified and thus fired the alert, so that worked as expected.

However, once we stopped and killed the errant process, the audit events got processed, the queue got empty and then the metric and also the watch should have returned to the OK state, or least that was our expected behaviour.

But now we have a strange issue: The metric for audit events in the queue is down to 0 again and shows the OK state again and every other metric for the watch was in the OK state already. But even though every single metric of the CMS watch is in the OK state, the watch itself refuses to return to the OK state as well and is instead still stuck on the Danger state. Which means all connected watches stay in their Danger state as well.

We hoped that maybe a server reboot might fix the issue, the watch is still stuck even after the regular reboot over the weekend. I also copied the complete watch and the copied watch actually switches to the correct OK state, so it seems to be just an issue with the original watch, which is the one that was part of the standard Installation of the BI platform server.

I guess I could delete the watch and recreate one as a custom watch, but then I would have to edit all the connected watches as well and I am also concerned that this would mean we could get corrupted watches in the future as well.

Is there any way I can debug or reset an individual watch or fix the issue? Any ideas would be greatly appreciated.

Joe_Peters · ‎02-11-2020

We have had this problem since Monitoring was first introduced in BI4.1 (we're now on BI4.2 SP06). I opened two separate incidents with SAP but they were never able to determine the cause or a solution. I even asked if it was possible to completely re-set the Monitoring metadata in hopes of clearing the problem but was told that was not possible.

I discovered that the only way to reset the status of a watchlist is to disable and then re-enable it. This works for me consistently well.

I created a Java program object that runs every 10 minutes -- if there is a watchlist in Warning or Danger status, it disables and re-enables it. This works well -- if the watchlist still meets the warning/danger criteria, then it will remain in that status after the program completes, otherwise it will reset to green.

I've found the monitoring component to be very buggy. I have all watchlists set with throttling so that an alert will only be sent when the status changes, yet I regularly get a flood of alerts when a threshold is crossed. And I can't even get to the throttling tab in the new Fiori version.

SAP BI Platform - Monitoring Watch stuck on Danger state even after metrics all return to OK state

Re: Analytical List Page Value help without Foreig...

Re: WWI servers in RISE

Re: nullTypeError: Cannot read properties of undef...

Re: SLT Servers High space / memory utilization

SLT Servers High space / memory utilization