Detect Application Crashes in SAP BTP Cloud Foundry with SAP Alert Notification service (Part 1)
For the sake of readability, we often use SAP BTP as a short form of the complete “SAP Business Technology Platform” name and Alert Notification as a short form of the complete “SAP Alert Notification service for SAP BTP” name.
Ensuring 100% availability of our cloud applications is critical for the daily business operations. We must be fully aware about any deviations. If disruption occurs, the prompt reaction and recovery are pivotal.
In the context of SAP BTP, the monitoring of Cloud Foundry applications is natively provided by the platform. The native Cloud Controller ensures not only full-fledged application management but observes the application’s overall performance and availability. The missing piece of the puzzle is the alerting part. Following the SAP best practices for utilizing the SAP BTP solutions, this gap can be easily fulfilled by using an SAP cloud offering – SAP Alert Notification service.
In this blog post, we are going to demonstrate how to be notified whenever an application crash occurs within your Cloud Foundry environment. Furthermore, you can select among multiple notification channels supported by Alert Notification to react. For this use case, we will take an advantage of the integration with PagerDuty for incident management. For all application crash events delivered by Alert Notification, an incident in our PagerDuty account will be triggered automatically. In addition, notification for such events will be delivered instantly via email to the responsible application administrator. However, this is just an example – use Alert Notification action of your choice. Let’s start configuring the target scenario!
Before we start, here is what you need to have in advance:
- Alert Notification service, enabled for an SAP BTP Cloud Foundry space
- Application instance, deployed in the same SAP BTP Cloud Foundry space
- (Optional) Configured account in PagerDuty
Configurations in Alert Notification
There is a list of application audit events for Cloud Foundry that can be matched by Alert Notification. The current implementation requires the administrator for the subaccount space, containing the application, to add an existing technical user corresponding to the data center. This user must have the Space Auditor permission. The purpose is to obtain information from Cloud Controller in Cloud Foundry.
Check the technical user for your data center here.
If you are not familiar with the Alert Notification’s terminology, before moving forward you can glance through the official documentation for more details.
In Alert Notification, let’s create a subscription and a matching condition.
There is a dedicated audit application event, which is triggered when an application instance has crashed. The Alert Notification’s event property for such an event is with eventType app.crash. Based on the multitude of event properties, you can create more conditions in order to filter only certain crashes, e.g., for particular applications.
Let’s create a condition, as part of our new subscription, with eventType equal to app.crash.
Define a name and meaningful description for the new action. For completing the configuration, you need to provide a Routing Key. Optionally, define a Field Mapping.
Let’s have a closer look at the configurations in PagerDuty. Firstly, in your account you should have an integration on any PagerDuty service with integration type Events API v2. For obtaining the routing key from the PagerDuty account navigate to Services –> Service Directory. Find the integration service for your scenario and go to Integrations. Locate the routing key under Integration Key and provide the value to Alert Notification.
In Field Mapping, optionally you can enter a comma-separated list of key value pairs where the key is a field name and the value is either a constant or a placeholder that will be dynamically replaced with the value from the incoming event. See more details in PagerDuty Action Type.
In the bottom of the incident’s details in PagerDuty, there is an option called “View Message“. There, you can check the raw format of the event received via the PagerDuty Events API. You might find the information useful, if the intention is to map the properties of the Alert Notification event to concrete fields in PagerDuty.
Note: This field mapping overrides any default field/fields specified in the event tag.
Add the new action(s) to the subscription, review the summary and complete the subscription’s configuration in Alert Notification.
How the crash event for our application instance will be detected and handled within the Cloud Foundry environment? In general, Cloud Controller stages, starts, and runs the applications. Furthermore, it configures a health check that runs periodically for each application instance. If a previously healthy application instance fails a health check, it’s considered as unhealthy. As a result, the application instance is stopped and deleted, then a new instance is rescheduled. This stoppage and deletion of the application instance is reported back to Cloud Controller as a crash event. See more details about the health checks flow here.
Note: Audit events can be created at any point during the execution of the action they describe. This means the action associated with the event is not guaranteed to have succeeded. However, it’s important to have this in mind for audit events concerning application instance stop, update, restart, etc. Application crash event is triggered when a health check for instance is failed.
And we are done! The goal of this scenario was to illustrate how to catch application crashes in SAP BTP Cloud Foundry environment, utilizing Alert Notification. Subsequently, there is a variety of notification channels and actions that you can apply to be informed and handle the disruptions within the landscape. Above, we have just exemplified the nice and easy to consume integration with PagerDuty. Give it a try!
We have the intention to extent the use-case in future blog posts. The goal is to introduce further options to react proactively and remediate application instance crashes automatically by using a state-of-the-art DevOps tools, such as SAP Automation Pilot. Stay tuned!