Service Availability Management in SAP Focused RUN & SAP Solution Manager
The Service Availability Management (SAM) represents a crucial functionality that has been integral to both SAP Solution Manager and SAP Focused RUN systems. The purpose of this blog post is to serve as a comprehensive guide intended for SAP consultants seeking to implement SAM for SLA reporting within their SAP landscapes. The complexities surrounding the calculations for overall availability and SLA have posed significant challenges in my personal experience. Therefore, I have undertaken the endeavor to encompass both theoretical concepts and the requisite mathematical calculations essential for the effective utilization and operation of the service availability management feature.
What is Service Availability Management (SAM)?
SAP Focused Run’s Service Availability Management ensures comprehensive availability reporting for crucial business systems, databases, and services. By analyzing system monitoring data, it calculates the availability and compares it against predefined Service Availability Levels, automatically incorporating unplanned outages.
Key benefits of SAM:
- Availability reporting for business-critical systems, databases and services.
- Automatically detects outages from System Monitoring.
- Maintain Service Definitions for Individual Systems.
- SLA Reporting.
- Service Management & System Governance.
Key Components of SAM:
- Overview of Availability – The Service Reporting page shows the calculated availability for the selected systems, databases and services in selected reporting periods. It gives you a quick overview whether the availability service level agreement was met or breached.
- Outage Summary – The Outage Summary view provides an overview over open and confirmed outages and Service Level Agreement.
- Uptime Reporting – Uptime reporting provides an overview of how long the system has been up and running without planned/unplanned disruptions.
- System Downtime Monitor – System Down Monitor gives the administrators a snapshot of all the systems that are currently not available either due to planned downtime or unplanned alert.
- Service Availability Definitions – Allows Service Management Team to create Service Definitions based on agreed SLA, contractual availability and accepted maintenance.
Configuring Service Availability Management
Here, we will be describing how SAM is actually configured for any SAP systems/services/hosts/databases. This can be a step-by-step guide for SAP practitioners intending to make use of this functionality in SAP Focused RUN or in SAP Solution Manager.
Step 1: Go to Service Availability Management Tile from Launchpad
Step 2: Click on the Service Availability Definition by expanding the left-hand side pane.
Step 3: Provide metadata of the Service Definition along with the target availability percentage and maintenance schedule
Here as a recommendation, provide the maximum possible value as the end of the validity period. The validity can be changed any time for a service definition even after it is created and released for active usage. We need to keep in mind that once the validity ends, there isn’t any way to reactive the service definition. The consultant need to create another service definition for the technical systems in that case. Hence, we provide the maximum value as the end date initially during creation and then alter it based on requirements.
In the “entity” section, one or multiple technical systems/databases can be added. If there is an external service which is added in LMDB as a technical system, then it can also be considered as entity and can be added as scope of the Service Definition.
Step 4: Provide Availability targets and Schedule relevant for SLA determination and System Governance
Here, provide the target SLA threshold from system availability perspective. This information can be achieved from any contractual documents where the target SLA for services (systems, databases, hosts etc) is maintained. In our example, we have considered the in-scope entity (technical system) as a productive system with a target SLA of 99.50%.
- Reporting period: The reporting period can be either selected as monthly or yearly depending on requirements. We are selecting monthly here to be as granular as possible for our availability reporting.
- Pattern: Pattern can either be monthly or weekly depending on the requirement of availability reporting. Here in our example, again we would try to be as granular as possible and hence selected the pattern as “Weekly”.
- Schedule: We can pick and select the exact days of the week when the system guarantees an availability of the declared SLA threshold. We can set the selection check boxes depending on the requirements. Also we can declare a start time and duration of hours in a day when the system is expected and committed to be available. In our example, we have declared a target of achieving 24×7 availability for our production system. Hence all the days, of the week were selected and the duration was provided to be24 hours each day.
Step 5: Set up Contractual Maintenance
As the term itself explains, contractual maintenance is the agreed window with the customer when maintenance activities related to SAP systems, databases, services, servers can take place. This includes OS/DB patching, upgrades or any activities related to operational maintenance. It must also be understood that, there may not be any maintenance activities on that declared period sometimes but as per contract, the administrators can conduct maintenance as per their requirement.
Contractual maintenance can either be SLA relevant or SLA non-relevant depending on the actual contract with the customer. Any outages reported on that period won’t be taken into account for SLA calculations and determining the overall availability of the services.
In our example, we have selected the contractual maintenance period to be relevant for SLA. The maintenance pattern has been selected to be taking place weekly on Sundays starting from 9:00 PM (Indian Standard Time or other timezone settings selected in the service definition) for 1 hour.
Hence, as per the definition – any maintenance activity scheduled to happen within the declared window is agreed upon by the stakeholders as per the contract.
Once all the values are provided correctly, the service definition can be saved. The service definition will automatically be activated for the selected entities when the start time of the declared validity period is reached.
Outage section from the left-hand side pane of SAM provides a complete list of outages which have been detected/reported for the selected entities. The outages can be Planned or Unplanned in nature.
|Open Outages||Confirmed Outages||Unplanned Confirmed Downtime||Unplanned total downtime||Remaining Downtime Confirmed||Remaining Downtime All||Availability Confirmed||Availability All|
|Number of total outages reported manually or via system monitoring.||Number of outages which were confirmed by service manager.||Total downtime as a result of confirmed outages||Total downtime as a result of both confirmed and not confirmed outages.||= (Total Downtime) – (Downtime from confirmed outages)||= (Total Downtime) – (Downtime from confirmed and not confirmed outages)||100% – (Reduction due to confirmed Outages)||100%-(Reduction due to unconfirmed and confirmed outages)|
SAM considers the IT Calendar to be the data source for planned outages. When there is an workmode defined for a technical system entity within IT Calendar and if there is a Service Definition created for the same entry in SAM, then after the configured workmode is completed, a corresponding entry is added automatically within Outages section of SAM for the technical system. This outage will be reported as “Planned”. Planned outages do not affect the SLA or the overall availability of the system. The outage will have “Workmode” set in the field “Source” within the outage.
Note: Planned downtime/outages doesn’t impact the SLA or the overall availability of a system.
Unplanned outages of technical systems occur when the system was unavailable for a certain duration mostly because of certain issues. The “availability” metrics activated as part of system monitoring is able to detect the outage and immediately throws an alert. Unplanned outages occur when the system was unavailable, but no declaration of the system unavailability was done in the IT Calendar. Unplanned outages are automatically added to SAM Outages when the “availability” metrics assigned to the technical system become green again from red. The source for unplanned outages is “MAI” meaning monitoring and alerting infrastructure.
Note: In order for the unplanned downtime to affect the calculations for SLA and remaining downtime till SLA breach, the outage must be approved. If the outage has been reported because of false alerts, then it can also be made hidden. Outages cannot be deleted but hidden. Also, the date and time of the outages can be modified (as single/mass) depending on actual figures and logs.
Once an unplanned outage is confirmed by the service management/system administrator, immediately it starts affecting the overall availability of technical system.
The most important part of the discussion revolves around the calculation of overall availability of the systems which get reported as a result of all confirmed outages. For an SAP practitioner trying to implement SAM, this carries utmost importance. Below is the calculation for our example of SAM:
Total Outage Reported= 10 mins
Total Availability before outage = 16*24*60 = 23040 mins (100%)
New availability = ((Total availability – Outage) / Total Availability)) X 100
= ((23040 – 10)/23040) X 100
= 99.956597% ~ 99.96% (approx.)
This perfectly matches the new availability which is reported by the system.
[Note: 16 is given because the service definition defined was from August 16th. Hence for monthly reporting, if we start calculating from 16th of August then we will have 16 days remaining till the end of the month i.e., 31st August. Since the availability reporting as per service definition was 24×7, hence 16 days were converted into minutes to determine the total availability.]
How much unplanned system outage is required in our case for an SLA breach?
Total minutes of downtime for SLA breach: (Remaining minutes of the month) – (99.50 % of (Remaining minutes of the month)) = 23040 – (99.5% of 23040) = 115.2 minutes.
Since, 10 minutes of outage was confirmed, then we would still have 105.12 mins for a possible SLA breach resulting from SLA relevant unplanned outages. This is also calculated by the system automatically. The column “Remaining downtime (confirmed)” can be referred in this case from the above screenshot.
[Note: 99.5% is defined as the SLA threshold in our service definition.]
Manual Outage Reporting:
In a situation when unplanned or planned outages have occurred, but the Focused RUN system was not able to detect it, then at a later point of time the system administrator or the service manager can add the actual outage with it’s duration in SAM outages. This enables the team to have a proper track of all outages even if they were missed by Focused RUN.
Additional points to remember:
- Contractual maintenance is NOT similar to workmodes. Contractual maintenance does not switch off the monitoring of the technical system.
- In cases where technical system is associated with databases consisting of physical to virtual database relationships in LMDB, then only the virtual database would be taken into account for SAM. SAM will throw error if we try to select a physical database which has relationship with virtual database assigned in LMDB. This is applicable mostly in HA scenarios for databases.
- Availability and outage data collected from SAM can also be fed into OCC dashboards. Data from SAM can also be used for guided procedure reports.
- Once created, service definitions CANNOT be deleted. We can deactivate a service definition by altering the last date of validity. Once inactive, new service definition can be created with altered figures for the same technical systems.
Hope this helps! Cheers! 🙂