Skip to Content
Author's profile photo Billy Warring

Alerts, the good, the bad, the ugly…

Ever work in an environment where everyone is all pro for monitoring, but as soon as you have things going everyone creates filters for the alerts and then take no action on the alert(s); even if they specifically had you create them based on their criteria?

I have setup various Monitoring systems (SolarWinds, Nagios, Solution Manager) and while they do have different ways to go about obtain their data, it always comes down to alerts being the worst part of configuring the monitoring system.

So before I continue on, I would love to see feedback on issues/solutions or good/bad/ugly situations you have in regards to alerting.  😀  

The Good:

  • Saving time – avoid some of those morning/middle of the day/afternoon redundant tasks, like buffers/memory/high CPU/number of users logged in/etc etc.
  • Transactional issues – know about a problem with your tRFCs/qRFCs/iDocs typically caused by bad code or a user and you get dragged into the issue for a resolution to the issue.
  • Knowing that your communications are functioning – RFCs, a failure here in some cases it would of been better easier if the destination system would of just gone down.
  • User specifics – number of users on a particular node/instance Dialog/HTTP/RFC types, perhaps system accounts being locked, or even when any account is locked.
  • Integration – Alerts can be set to generate a ticket in the ITSM and in turn a ChaRM request to resolve the particular alert

The Bad:

  • Alerts are mis-configured – missed a zero or added one too many, whatever the cause the incorrect value caused a false positive or worse yet, no alert and the system or business process fails.
  • All talk – yeah we should setup alerts for x, y, and z; I’m tired of team a, b, and c coming down here every time something is broke.  Result alert flooding and you have to turn it off or they don’t tell you that they setup filters to just delete the notifications.
  • ITSM or Third party ticket system – I don’t want a ticket created as it will just add to the email (that I am deleting via rules) and require me to log into a system to close the ticket that was generated due to an issue.
  • Functional co-workers disabling metrics – Co-workers who have the ability to log into the host OS and they disable monitoring agents/collectors due to a task manager having CPUs spiked for less than 1 min, and then informing you they would just turn it back on if there was a problem with the system.

The Ugly:

  • Metrics stop working – Odd no alerts over the last few days, wonder wha…Holy mother of *beep beep beep* and you spend the rest of your day hoping to resolve all issues before an end user or functional person catches it.  Then try to figure out how to re-enable metric collection as you don’t care to go through that again.  😉
  • Lack of requirements – which could be linked with a lack of enforcement, what should be monitored and when does value X become a problem (threshold settings), and then everyone’s opinions get in the way.
  • Old school settings – I’m all for reliable, what I find ugly are the folks that refuse to listen or even review/look at newer methods that perform the same function but its “new”.  sapccms4x/sapccmsr/ccmsping <– I understand you had to work with what you were given, but these are way more complicated then they need to be…frankly monitoring of SNMP could not of been that difficult, and its just as confusing!  😆
  • Sending template descriptions – Taking the time to print a template to a PDF (in most cases large PDFs) to send off to people to review and you wait to hear back for at least a mention of 1 useful metric…
  • Lack of familiarity – Having co-workers who recommend enabling alerting on any SM21 log deemed with a status of Red…to those newer to the land of SAP this would send an email for every lost/disconnected SAP GUI session, you would quickly disable the alert.

Feel free to review/rate some of my other blogs

Feel free to review/rate some of my other doco


Assigned Tags

      6 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Lluis Salvador Suarez
      Lluis Salvador Suarez

      Bookmarked ¡

      Great post Billy, thanks for sharing it.

      Regards,

      Luis

      Author's profile photo Billy Warring
      Billy Warring
      Blog Post Author

      Hopefully it didn't have too much of a 'rant' taste to it, current events at the job in regards to alerting kicked it off!  😛 😉

      Author's profile photo Tom Cenens
      Tom Cenens

      Hi Billy

      I noticed your comment on Raquel Cunha 's blog post so I decided to dig into your content 😉 . Luckily I did.

      An entertaining blog post 🙂 . I like it. What I miss here is the fact that the documentation of SAP fails short. I've seen architecture drawings in course E2E120 Technical Monitoring which are really useful that I cannot seem to find in any SAP documentation source (other than the course).

      It's essential to have the details as an administrator to know what is coming from where and how the data is being collected etc and having to dig into the system yourself to find out is not the most fun experience.

      The first challenge is setting everything up so metrics do come in, in a correct way and then the second challenge indeed starts, keeping it that way 😉 .

      I do like the Technical Monitoring scenario in SAP Solution Manager 7.1. There are a lot of possibilities. One of my favourite things right now is the ability to pull in data from Wily Introscope because before Wily Introscope was only used to "view" after a problem took place and now through the possibility to integrate metrics into Technical Monitoring in a fairly easy way, you can actually pull in interesting information and place them into the SAP Solution Manager monitoring.

      We use the interface capabilities to only generate an alert in the end in the alerting tool that is used for all systems (also non-SAP). But I'm not on the receiver side here, I'm part of the SAP Solution Manager team 😎 and I ship the alerts to the SAP Basis teams 😏 .

      CCMSping is still very valid though by the way. The recommended way to monitor the availability of a managed SAP system is still CCMSping.

      Best regards

      Tom

      Author's profile photo Billy Warring
      Billy Warring
      Blog Post Author

      OOOO a new stalker on the SCN!  😉

      I have actually not taken E2E120 (only E2E100 RCA), based upon the various technical doco and my prior experiences with SolarWinds NPM I actually had to rebuild this at one point and it took me probably 3 days or so with all the various Network Equipment and Servers, and this also included the backup of the network device configurations...SAP's choice to leverage an agent based install slowed that down for me, until I found the installation .ini trick which helped quite a bit...then I find out about the weekly hostctl updates from SAP and how to keep that up to date on top of all the other Solution Manager functionality. 

      Could not agree with you more on having to dig into systems to figure out optimal metric settings, and I will raise you to try doing this with no BPB in Solution Manager, on top of a slow implementation phase.

      Until I was at Admin2013 I didn't understand the relationship between TechMon and Wily, I was happy enough just being able to see some kind of real time display for various metrics on the systems.  Knowing now that I can leverage alerts from Wily metrics still doesn't change how things are currently in my environment...I can't get functional or even my own team to review the EWAs on a regular basis, so I have a feeling we have larger concerns. 😉

      I also assume that all CCMS tools are still valid, but that doesn't mean I have to like them or attempt to use them 😛 😏 I even scripted the ccmsping/sap4x/sapccmsr installations out as that was a far worse process than the SMD installer!

      Since I did capture your interest in regards to alerting, and I just recently took on the mantle of "TDMS dude" on my team; are there plans on TDMS/SolMan integration as I would love to be notified when something stops a TDMS run vs having to stare at my monitor for hours on end!!!  😯

      Author's profile photo Tom Cenens
      Tom Cenens

      Hi Billy

      Stalker is a big word 😉 but you caught my attention with this blog post 🙂 . In other words I do hope you continue to blog, it's definitely worth it.

      One of the more interesting PDF's on Solution Manager which can be very useful for Technical Monitoring is the PDF which has the communication channels & users: http://wiki.scn.sap.com/wiki/download/attachments/168722509/Architecture+User+Overview.pdf

      The latest best practice is using Diagnostics Agents on the fly in order to lower the efforts to needed to install / maintain agents for each virtual SAP host.

      Still, the architecture for Technical Monitoring is very complex in terms of the number of elements / entities that exist and also due to the lack of documentation in certain area's which can make it very hard to troubleshoot a broken scenario.

      A lot of customers still don't do too much with SAP Solution Manager which is a pitty because there are really interesting scenario's that can basically save them money because most of the time they have other software that is used for those purposes. Or they have got nothing at all which can bring them what the scecnario can bring them.

      Yes, CCMS is still supported. Standard support contract customers are not eligible to use Technical Monitoring so they have to stick with CCMS.

      Best practice around TDMS sais it's a separate SAP system but it could in fact be installed / used on SAP Solution Manager. SLT replication for SAP HANA uses the same component (DMIS) and for POC's it's often installed on SAP Solution Manager. The downside is though that SAP Solution Manager is patched regurarly and it's not the ideal system then to put such a add-on on. That's why the recommendation is to place it on a seperate netweaver system. Looking at HANA administration, SAP sais "in the end, the customer can choose" so it is allowed to place DMIS on SAP Solution Manager but it's not consider best practice.

      Best regards

      Tom

      Author's profile photo Billy Warring
      Billy Warring
      Blog Post Author

      @Tom, in regards to that PDF...that sound you may of heard was my jaw hitting the ground!  😯   I could of used this about a year ago to gain a better understanding on how the various areas (or work centers) inter-act within the landscape.

      I can confirm we are one of those customers for SAP 🙂 .  About 2 months ago I was in a meeting for BOBJ performance issues, I pulled up Wily on a projector and for the next hour every consultant proceeded on running Webi reports to break the system...I talked to the organizer of the meeting and suggested we take a look at EEM as an alternative as its going to be a cheaper and more reliable solution to repeat Webi queries.  Currently work is still being done to troubleshoot performance issues with EEM as some reports are taking almost 15 mins to complete after only a few days....that is if they scripts even complete correctly.  Ahhh good times good times! 😐

      Good to know about the Standard Support model, I was unaware of this restriction on a Solution Manager.

      Its amusing you say that about TDMS and SolMan, the developer who lead the class stated that almost all customers leverage SolMan for the Control system and splitting off the Central to a dedicated NW stack.  I did a few test runs and the hardware I currently have in the VM works well for SolMan, TDMS runs on top of that depleted resources and generated some alerts that I did not care to receive 🙂 .  I'm also unsure why I was informed that the TDMS work center would show up in the typical SolMan work centers, I kinda of expected a little more integration with SolMan...even from the SolMan Project standpoint, I don't understand why you can't reference a TDMS project # and link a dependancy to it.  Perhaps in a future release 😕 or my expectations are too high 😉 ?