Machine Learning Services for Hybrid Operations: implementation for SAP Solution Manager 7.2 and SAP Cloud Platform / SAP HANA XSA (Part 2)
Dear reader, welcome to a series of articles “Machine Learning Services for Hybrid Operations”. In our publications we will focus on key challenges of well-known thresholds-based alerting systems (e.g. majority of ALM products in SAP portfolio including SAP Solution Manager) and will propose a Machine Learning based approach to extend existing monitoring and alerting processes. Stay tuned!
In a previous article we presented features of Machine Learning “black box” events post-processor (“Machine Learning Extension”) concept to address key challenges when working with thresholds-based alerting systems:
- Filtering out of redundant alerts, related to recurring metric behavior variations
- Detection of non-threshold violation related deviations
SAP Solution Manager 7.2 is a most popular monitoring and alerting platform from SAP. Alerts in SAP Solution Manager 7.2 are based on thresholds violations facts.
In this article SAP Labs CIS CoE St. Petersburg present in-house Machine Learning Extension implementation for SAP Solution Manager 7.2.
To keep in mind SAP Solution Manager other scenarios tuning possibilities and non-SAP systems coverage it was decided to place all machine learning related functions to a platform for the development and execution of micro-service oriented applications with a goal not to be strictly attached to a single programming language.
In result, Machine Learning Extension was implemented for SAP Solution Manager 7.2 as set of micro-services, running in SAP Cloud Platform (cloud platform-as-a-service runtime environment) or SAP HANA extended application services (on-premise runtime environment on top of SAP HANA), depending on requirements.
SAP Solution Manager 7.2 system with Machine Learning Extension (simplified):
Machine Learning Extension for SAP Solution Manager 7.2 functions and features:
Solution high level architecture for SAP Cloud Platform option:
Solution high level architecture for SAP HANA XS Advanced option:
Key architecture aspects of Machine Learning Extension solution:
- Solution is a set of micro services on Python and Node.js running inside Cloud Foundry environment of SAP Cloud Platform or SAP HANA XSA
- SAP Solution Manager 7.2 Monitoring and Alerting Infrastructure (MAI) is integrated with the solution, deployed to SAP Cloud Platform or SAP HANA XSA
- Solution anomaly calculation core is based of Machine Learning algorithms and provides API to calculate real-time anomalies ratings for incoming quantifiable metrics of various nature: solution can extend different monitoring scenarios like Technical Monitoring, Interfaces Monitoring, Business KPIs (resources consumption percentages, amount of users, amount of records in specific tables, orders count etc.)
- Machine Learning model doesn’t require labelling, behavior patterns are modeled uniquely for every metric of every monitoring object in scope
- Machine Learning model needs to be trained at least on past two weeks data (human participation is not needed)
- Once Machine Learning model is trained, metrics and alerts data is pulled by solution from SAP Solution Manager 7.2 MAI every 5 minutes
- Incoming metrics values from MAI are compared with predicted metrics values Machine Learning Anomaly Detection service, and anomalies ratings are calculated
- Anomalies measurements are transferred back to SAP Solution Manager MAI for further alerts reliability evaluation and non-threshold violation related anomalies data provision
Alerts reliability evaluation process for SAP Solution Manager 7.2 with Machine Learning extension is seamless from monitoring experts and/or IT Operators perspective, here is the algorithm:
- Threshold based alert is created in SAP Solution Manager 7.2 Alert Inbox
- Alert data is sent to SCP/XSA extension to evaluate if there is an anomaly detected
- If active anomaly is confirmed by SCP/XSA extension, an alert is considered as reliable and can be processed as usual
- If active anomaly is not confirmed by SCP/XSA extension, an alert is considered as non-reliable and can be automatically suppressed with classification “Suppressed by AI” and confirmation category “No anomaly detected”. Operators from email/SMS distribution list will not receive notifications
Example of a non-reliable alert, confirmed in Alert Inbox of SAP Solution Manager 7.2:
Key MAI integration architecture aspects:
- Data exchange is established via OData
- Metrics values are taken every 5 minutes from flat aggregates table and not from Business Warehouse subsystem
- It is important to consider, that internal IDs can be changed inside MAI during monitoring templates re-assignment and/or other administrative actions
- MAI Alerts redundancy analysis process can be performed within Alert Consumer BAdI Interface as an entry point (more details available at https://support.sap.com/en/alm/solution-manager/expert-portal/alert-consumer-badi-interface.html )
- During alerts reliability evaluation process a situation with several Metrics contributing to a single Alert should be considered. An alert can be trusted in case there was at least one metric with detected anomaly found (metric index number in metrics list doesn’t matter):
SAP Focused Insights can be used to visualize systems statuses, based on pure anomaly detection monitoring, for example:
- Currently active abnormal events per system
- List of currently active long anomalies
- Suppressed alerts per date
- Top 10 longest anomalies
- Top 10 frequent abnormal events
- Average time for anomaly closure per system
- Percentage of abnormality distribution between instances per system
Operation Control Center dashboards, related to anomaly detection (example):
Following microservices approach the solution can be extended further by:
- New collectors introduction
- New Machine Learning models usage
- New services implementation based / not based on Machine Learning services
- New backend influence scenarios
- Any visualization techniques
Summary for part 2:
It is possible to implement Machine Learning “black box” events post-processor (“Machine Learning Extension”) concept for SAP Solution Manager 7.2.
Machine Learning Extension for SAP Solution Manager 7.2 features:
- Standalone pre-configured set of services, running in SAP Cloud Platform Cloud Foundry or SAP HANA XSA
- No data labelling necessary
- Microservices architecture
- Secure connectivity standards
Machine Learning Extension for SAP Solution Manager 7.2 enhances standard monitoring capabilities with following functions:
- Anomalies detection
- Automatic alerts reliability evaluation
- Metrics correlation
- Capabilities to get data from third-party monitoring tools
If active anomaly is not confirmed by SCP/XSA Machine Learning Extension, an alert is considered as non-reliable and can be automatically suppressed. Operators from email/SMS distribution list will not receive notifications.
The solution can be extended further according to microservices approach.
In next chapters we will provide more details on algorithms and principles of SCP/XSA Machine Learning Extension and AI story.
About the author
Andrew Kusnetsov (https://people.sap.com/andrew.kusnetsov) is a senior SAP Solution Manager Engineer from SAP Labs CIS CoE St. Petersburg. He is working in ALM/IT Operations/DevOps team since 2013 and is delivering Hybrid Operations related projects in EMEA region.
Debt of gratitude to Artem Sharganov (https://people.sap.com/artem.sharganov) for sharing a key contribution to Smart Monitoring initiative and help with this article.
Machine Learning Services for Hybrid Operations series
- Machine Learning Services for Hybrid Operations: motivation and concept (Part 1)
- Machine Learning Services for Hybrid Operations: implementation for SAP Solution Manager 7.2 and SAP Cloud Platform / SAP HANA XSA (Part 2)