Application autoscaling with Automation Pilot and Alert Notification in SAP BTP Neo Environment
For the sake of readability, we often use SAP BTP as a short form of the complete “SAP Business Technology Platform” name and Alert Notification service as a short form of the complete “SAP Alert Notification service for SAP BTP” name. The same is relevant for “SAP Automation Pilot”.
Ensuring efficient application management for our business-critical cloud functionalities is pivotal. In terms of application load, it is usually referred to the application’s ability to handle and process different amounts of work. Often, the application load fluctuates – the resources demand (CPU, memory, etc.) increases and decreases over the time unpredictably. Here comes the need of scaling, in particular dynamically adding or reducing the number of active instances and thus optimizing the computational resources being used.
Moving to the SAP cloud world, Application Autoscaler Service provides a full-fledged and flexible autoscaling option in SAP BTP – Cloud Foundry. However, for SAP BTP Neo this is not the case. In this blog post, you are going to learn how to benefit from the native integration between Alert Notification and Automation Pilot. In doing so, your applications can be monitored, scaled-up and scaled-down automatically, based on predefined rules. The goal is to show you how you can configure this setup and adjust it to your own needs.
What is the target scenario? SAP Monitoring Service for BTP Neo monitors our java application, which lately has been reporting high CPU spikes. By default, there are upper boundaries set for the amount of memory and CPU that are allocated to a single instance. Once a critical threshold has been reached, the monitoring service sends an error alert to our Alert Notification instance. On its end, Alert Notification triggers a recommended action in Automation Pilot which scales-up the application instances and monitors their performance over а pre-defined period of time. When the application CPU load proves to be stable for a certain period of time, Automation Pilot scales-down the instances.
Furthermore, Automation Pilot is configured to push events to Alert Notification for each status change of its execution. Alert Notification delivers the messages to us via a channel in MS Teams. In the course of this scenario, the manual input by operator’s side is completely eliminated. Sounds good? Let’s see how this setup can be achieved.
Our java application is deployed and running in a SAP BTP Neo subaccount. SAP BTP Cockpit provides a nice overview of the latest application state – running instances, process metrics, etc. In the monitoring section are listed metrics with default warning and critical thresholds for each instance.
In this example, we would like to be notified and react accordingly, if the CPU load metric of a single application instance reaches a critical level – above 90% CPU usage. Let’s start with the configurations in Alert Notification.
Configurations in Alert Notification
These are the Subscriptions configured for our scenario:
The first one aims to push events to Alert Notification for each status change of the Automation Pilot’s execution. The matching condition that serves our needs has resource name equal to “SAP CP Automation Pilot“. For this subscription Microsoft Teams Action Type is enabled. For enabling Automation Pilot to produce events, some quick configurations in the tool are required. Check the section Enabling Automation Pilot Events to get more details.
The second Subscription informs us of all CPU events, related to our application called “demo-app”. These are events with Alert Notification’s event properties eventType equal to “CPULoad” and resource.resourceName “demo-app“. They will be delivered to our MS Teams channel.
The third Subscription requires more attention, as Automation Pilot is going to act reactively by applying a solution, configured by us in advance. Hence, we should configure a set of matching conditions:
According to the current schema for severity mapping, we are interested in the ERROR value. То narrow down the scope further, the exact name of the application is included as well.
Alert Notification supports built-in integration with Automation Pilot. This allows Automation Pilot commands to be triggered, as one of the Actions in a Subscription. To consume the integration, you need to create a Service Account in the Automation Pilot tenant. The credentials and an Event Trigger URL must be provided to Alert Notification, while configuring the Automation Pilot action. This will be explained in more details in the next section.
Configurations in Automation Pilot
Automation Pilot comes with provided content, which covers broad area of DevOps tasks for SAP BTP Neo and Cloud Foundry environments. In general, to automate a given process, a number of commands are chained and executed in a specific order defined by the user. Applying this approach, you can create your own commands to resolve issues and optimize processes within a specific landscape.
Among the provided Automation Pilot content, we can find ready-to-be-used commands which will help us to monitor and horizontally autoscale our application and prevent failures. Specifically, these are the commands: GetJavaAppMetrics, AddNeoAppInstances and RemoveNeoAppInstances. The names are self-explanatory. Nevertheless, each command has a description, input and output keys which help to understand its purpose. The command GetJavaAppMetrics consumes the SAP BTP Monitoring’s API and retrieves real-time data for the available application metrics, so it’s fundamental for our case.
Let’s create a composite command that is applicable for the autoscaling scenario. In the Automation Pilot’s UI, navigate to Commands and click the Create button. From the drop-down list select a catalog, in which the command will be created. Provide a name and a meaningful description.
The composite command assembles the above-mentioned commands and performs them in a single execution. Each command has a set of input keys. They must be added manually. For the required ones, values must be provided before triggering the execution. In addition, there are two optional input keys – scaleUpThreshold and scaleDownThreshold. They define the CPU load threshold above/below which an instance will be added, respectively removed.
Note: Name and the data type of the input keys, must fully coincide with the same ones defined in the commands. Otherwise, the mapping would not be possible.
Now it’s time to construct the execution.
The logic of the execution is as follows: the command starts by getting the current metrics for the application. If a CPU spike is registered above the set threshold, the next step will be executed – adding a new instance. Then, the application’s behavior will be observed over a certain period of time by utilizing once again GetJavaAppMetrics. If after its completion, the average amount of the CPU load of all running instances is below the scaleDownThreshold, one instance will be removed. If the high application load continues, all instances will be left running and the command execution will stop. Have in mind, this is an exemplary model – you should modify it in accordance to your needs.
Some helpful tips*
*The example features an application with up to 3 running instances and the given expressions are illustrative
- In order to eliminate very short high CPU spikes, add initial delay in the first step of the execution. Thus, false-positives will be minimized
- The command GetAppMetrics returns exhaustive information about all metrics (stored in output key called processesMetrics), reported by the running application instances. You can sort out the metrics and extract the values you need. Here comes the assistance of dynamic expressions, particularly, the jq filters supported by Automation Pilot. They help us filter the output produced by the previous step and extract the fields important for us, transformed in a desired format. In addition to the standard jq functionality, Automation Pilot supports some custom ones, which makes the functionality even more flexible. Get more insights here
- To ensure the execution will move forward accordingly, set Condition for the steps
Example: Step “scaleUp” is executed only if the average amount of the CPU load metric of all running instances in above a certain level. In this case – the value defined for input key scaleUpThreshold. This can be achieved by another dynamic expression: $(.getMetrics.output.processesMetrics | map(.metrics[ ] | select(.name == “CPU Load”).value) | (. + . + .) / length) is greater than $(.execution.input.scaleUpThreshold)
The same Condition is relevant for the rest of the steps.
- To ensure the application performance is stable after the intervention, add Repetition to the step which gets the current metrics for second time:
In the concrete case, the step will be repeated till the complex conditions is fulfilled: the number of instances has not changed and the average amount of the CPU load of all running instances is greater than the scaleDownThreshold. If this execution completes successfully, we can be sure the demand for application resources is curbed. Otherwise, the execution will fail after 60 attempts – the defined number of repetitions. Once we have affirmation, the execution will proceed with removing an instance that is no longer required.
- For better traceability, you can configure a collection of output keys which the command will produce in the end of the execution
Let’s go back to the Automation Pilot action configuration in Alert Notification. Once the command is ready, we need to generate the so-called Event Trigger URL. This is used for triggering a command from an external system. How it works? Alert Notification will send the event payload as body of the request. All other properties will be configured beforehand. To do so, in Automation Pilot’s UI, navigate to section Executions and on the left-hand side click the Build Event Trigger button. Select the new command.
Notice that you cannot set any input values. You have to include inputs that hold keys of the same name and data types as the input keys of the command. Therefore, provide Automation Pilot input that contains credentials. It will always be used when the command is executed. The rest of the properties will be dynamically extracted by Automation Pilot from the event payload, because the Trigger Type is Alert Notification.
Note: For Automation Pilot to be able to access the SAP BTP Neo Platform API’s and execute actions against cloud resources, you have to provide an OAuth Client with sufficient scope for application management and access to monitoring data. The required steps are describe here.
Paste the automatically generated URL to the Automation Pilot action configurations in Alert Notification and provide the Service Account‘s credentials. See more details here.
If a high CPU load is reported by the Monitoring Service, besides notification in MS Teams, a recommended action for application autoscaling will be triggered in Automation Pilot.
Automation Pilot automatically adds to the execution all tags from the Alert Notification event. You can track the current status in Executions. Besides, Automation Pilot pushes events to Alert Notification for each status change. This is an example of such an event.
This is how you can leverage Alert Notification and Automation Pilot to autoscale your applications in SAP BTP Neo. The overall benefits from this approach are obvious. Among them are reliable performance, proper utilization, lower cost of cloud resources, and most importantly – eliminating the need of manual intervention during real-time spikes and declines in the applications operations.
Get more insights from the online course Efficient DevOps with SAP, part of the OpenSAP trainings selection. Stay tuned for news around the tools and feel free to provide feedback! This will help us to improve Automation Pilot and Alert Notification even more.