Event Management with Process Automation: Automation of Alert Reaction in IT Event Management
Previous blogs describe the Operations Control Center (OCC) as an integral part of SAP’s best practice operational model for IT organizations. As the operations part of “DevOps”, it is responsible for services such as Monitoring, Alerting, Reporting/Analytics, Dashboards/Transparency as well as Root Cause Analysis of hybrid system and solution landscapes. Modern operations/OCCs incorporate intelligent collaboration and procedure automation. Examples of attended (supported) and unattended (automated) event and alert reactions are found here.
Use Case Introduction
One example shall illustrate the benefits of process automation for IT operations, i.e. the Operations Control Center in your Center of Expertise (CCOE).
Event Reaction is part of the IT scenario Detect-to-Correct, or IT process Event Management. Normal operational behavior of the system and solution landscape is defined by threshold values for metrics (measures), which – if breached – trigger events/alerts to be processed by human operators on shift. Events/alerts are typically manually addressed with the help of so called standard operating procedures (SOPs) or embedded, executable procedures e.g. the SAP Solution Manager Guided Procedures, which can be executed (and governed) within SAP Solution Manager itself and even semi-automated.
With SAP Process Automation, the OCC can automate these alert reactions and turn them into automated event reactions (AERs). This automation is particularly critical for ensuring the availability of system and solution landscape components, especially managing system components, without which any IT operations team or individual is blind. The following example scenarios are therefore briefly outlined (and highly recommended):
Scenario 1 – Automatically restart a Broadcom Introscope Enterprise Manager (an essential system component of SAP customer’s managing system landscape) if it is identified to be offline
- Open Broadcom Introscope Administration UI to check if Introscope Enterprise Managers are offline
- Restart Broadcom Introscope Enterprise Manager
- Send log file to OCC operator/administrator if Broadcom Introscope Enterprise Manager cannot be restarted
Idea: The robot can automatically restart the Introscope Enterprise Manager if it is detected as offline in the Administration UI. If it cannot reconnect, the robot collects the relevant log files, highlights the error messages that possibly lead to the disconnection, and notifies the OCC administrator.
Benefit: Introscope Enterprise Manager occasionally might be offline without anyone noticing it immediately. Operations teams only notice after they miss metrics for monitoring and alerts for alerting. Restarting this essential server component as soon as it appears offline reduces missing monitoring metrics and prevents the case where production outages are missed because the alerting is not functional (with potentially catastrophic consequences).
Scenario 2 – Automatically restart SMD Agents (Diagnostic Agents) (an essential system component of SAP customer’s managing system landscape) if it is identified to be offline
- Open SMD Agent Administration UI to check if SMD Agents are offline
- Restart SMD Agent for managed system
- Send log file to OCC operator/administrator if SMD Agent cannot be restarted
Idea: The robot can automatically restart the SMD Agent (Diagnostics Agent) if it is detected as offline in the Agent Administration. If it cannot reconnect, the robot collects the relevant log files, highlights the error messages that possibly lead to the disconnection, and notifies the OCC administrator.
Benefit: SMD Agents regularly drop without anyone noticing it immediately. Operations teams only notice after they miss metrics for monitoring and alerts for alerting. Restarting the agents as soon as they appear offline reduces missing monitoring metrics and prevents the case where production outages are missed because the alerting is not functional (with potentially catastrophic consequences).
Use Case Details
System Architecture and Landscape
SAP Process Automation offers the development environment, the desktop agents, as well as the runtime environment (on SAP Business Technology Platform). The use cases outlined in this article utilizes this standard installation and components of SAP Process Automation.
The implementation is encoded in scripts for the Operobots to execute. The following outline the scripts steps.
To illustrate the development environment (scripting environment), the following screen presents the automation steps for scenario 1 in SAP Process Automation Desktop Studio.
Scenario 1: Operobot script steps (Introscope EM restart)
To illustrate the logical design, the following diagram presents the automation steps for scenario 2 as decision and activity tree.
Scenario 2: Operobot script steps (SMD Agent restart)
Save time and effort detecting and resolving the root cause of alerts. The examples above are essential for every customer’s Operations relying at least in part on SAP on premise managing systems like SAP Solution Manager, SAP Focused Solutions, etc. (i.e. not yet transitioned to SAP software as a service solutions like Cloud ALM for Operations). If essential managing system components are unavailable, the SAP-centric event management process is at least partially unavailable, affecting monitoring and alerting as such for one – or worst case – all SAP systems’ operations.
Increase effectiveness and efficiency of your Operations Control Center. The examples above allow your operators to not lose focus on the actual work and can rely on automatic error detection and error resolution, 24/7, implementing a “NoOps” approach.
The objective of Operobots (NoOps) is to address the following challenges and painpoints of Operations Control Centers:
- Time consuming alert reaction procedures
- Time critical procedures w/ requirement for immediacy and precision of their execution
- Externalization and documentation requirements for procedures
In combination with managed systems and managing systems as sources of data collectors, respectively the managing system as creators of events/alerts, Process Automation offers a unified user interface for seamless execution of OCC tasks and activities. This user experience also offers an interaction with other operators and operobots to get the job done. This is not limited to Event Management, of course. Incident Management, Problem Management, Change Management, Request Management, etc. as well as analytical use cases can be integrated as well.
In order to automate parts of a process, it has to be sufficiently documented. Even a “simple” process is not that simple from a robotic process automation point of view. During the implementation of process automation, advanced scripting knowledge is recommended as well as the awareness that not all scripting functions might be available in SAP Process Automation. Leverage the SAP Process Automation Store for utilizing its offerings and components instead of starting from scratch.
When applying SAP Process Automation in the context of IT operations/Operations Control Center, i.e. technically in context SAP Solution Manager, SAP Focused Solutions, etc., it has to be mentioned that i.e. SAP Solution Manager UIs and UI components are comparatively difficult to capture due to the underlying infrastructure/technology and the design of its screens. Patience and perseverance are definitely required. But the results are worth it.