Architecting solutions on SAP BTP for High Availability
SAP Business Technology Platform (BTP) is an open platform based on a multi-cloud foundation enabling it to run on top of different hyperscalers like Microsoft Azure, AWS, GCP & Alibaba Cloud. The partnership with these hyperscalers has made it possible for SAP to scale and offer SAP BTP across various regions. This provides lot of flexibility for those customers want to co-locate SAP BTP along with their existing software solutions and leverage its capabilities to either extend or integrate solutions. Once a customer subscribes to SAP BTP, they can create subaccounts for extension/integration scenarios on any of the available regions/IaaS providers. You can explore all the available capabilities of SAP BTP from the SAP Discovery Center.
From a services capability standpoint, there is lot of work which is currently being done/in-progress to surface the service capability of the underlying hyperscaler within SAP BTP. For example, SAP BTP offers PostgreSQL , Object Store which in-turn leverages the corresponding hyperscaler capabilities. For a customer this is transparent, and they do not need to worry much about how this works in AWS or Azure. The Kyma environment (based on Kubernetes) on SAP BTP, exposes a catalog of services which customers can subscribe to from other hyperscalers (separately) and architect their solution with best of breed services.
There has also been some work carried out to offer private connection between apps deployed on SAP BTP with the hyperscalers to enhance performance and security – SAP Private Link.
This blog post came out of a recent customer discussion on how to architect solutions that support critical application running on SAP BTP for High Availability (HA). Since SAP BTP runs on top of hyperscalers, it can benefit from the proven Multi-Availability Zone concepts which the hyperscalers leverage.
Availability zones (AZ) are single failure domains within a single geographical region and are separate physical locations with independent power, network, and cooling. Multiple AZs exist in one region and are connected with each other through a low-latency network. SAP BTP services run on this Multi-AZ concept of the underlying hyperscalers thereby offering High Availability. Hence, if there is an outage in one of the AZ’s, the service/application will self-heal and will continue to be serviced with the other AZs in the region.
The SAP BTP Service Description guide provide more information on the SLA for the services. As of 1-Aug-2021, SAP announced the increased availability of 99.95% for several critical services of SAP BTP. The SLAs are being constantly reviewed and there are many activities on the way to increase the SLA commitments. Please review the roadmap and documentation to obtain the latest SLAs.
Another important topic which also gets discussed a lot with customers is on the maintenance windows. As you can see from the Maintenance Window documentation for SAP Cloud services, the maintenance windows and major upgrade windows are documented. During the maintenance windows, changes/bug fixes are rolled out which does not impact any of the services (ZDM). However, SAP reserves 4 windows (once each quarter) to perform major upgrades. There could be major changes related to network/security/DB and these would be communicated in advance to customers.
Now that we have covered some of the basic concepts, let’s look at some of the options which are available to architect solutions which are highly available. Please note that these are just possible scenarios which you can test out before productionizing them.
With the ability to create SAP BTP accounts spread across the globe on different IaaS providers gives us more flexibility to architect such solutions. Whether you are developing a Fiori apps as an extension or an interface using Integration Suite you can easily move these artifacts between BTP accounts across different providers.
For illustration purposes, I am using a large organization as a customer who has invested a lot with Microsoft and is leveraging Microsoft Azure for SAP and 3rd party workloads. This customer has operations across EU and US.
In the below Solution Diagram, I am leveraging the multiple subaccounts for staged developments. DEV/TST/PRD (in Azure US) and another PRD (in Azure EU). All the development and testing happen out of the subaccounts marked as DEV/TST and the changes are pushed across the landscape using the Transport Management services. As you can see, there are two productive subaccounts across two regions. Please refer the documentation on best practices on deploying application. The roadmap is constantly updated for the Transport Management & CI/CD services. Please check if the artefacts which you are looking to transport are supported.
Scenario 1: Cross-region failover with distribution of load
This scenario assumes you have end users scattered across EU and US and would like to load balance the Fiori Launchpad exposed on SAP BTP. The Fiori Launchpad is deployed on both the SAP BTP accounts in EU and US. The same custom domain has been configured for both the Launchpads. For illustration purposes, I have used Azure Traffic Manager which is a DNS-based traffic load balancer. The purpose of the Traffic Manager is to distribute the traffic between these two launchpads and also supports routing the traffic to an instance of the launchpad when the other one goes down (for example, when there is a maintenance window which results in service disruption etc..). There are different routing rules which you can configure to provide the best experience for your end users. There is a best practice documentation which explains some methods on how to identify a failover and the use of rule-based solutions like Akamai ION.
Scenario 2: In-country failover
In one of my recent engagements, I was dealing with a customer that operates in a highly regulated industry and were looking to ways to architect a solution on SAP BTP for one of their mission critical app. Due to tight regulations, the solution had to be based on providers within the country. In Australia, we have SAP BTP on two providers – AWS & Azure (similar to many other regions). This provides an option to architect a High available solution across two providers. For the end customer, this is transparent as they would be directed to the Fiori launchpad depending on the load and availability.
Another interesting thing to note in the above solution diagram is that the maintenance windows for SAP BTP on each of these providers fall on different weekends. Hence, its very unlikely that there is a major upgrade happening to SAP BTP service which might affects the same service across all the providers in the country. Obviously, there are lot of other aspects to think through with this setup – especially when you are looking to setup Azure ExpressRoute or AWS Direct Connect for connectivity into your on-premise systems.
Hope this blog post gave you some ideas in terms on how you can leverage SAP BTP across different regions/providers to architect highly available solutions. Thought I used an example of Fiori Launchpad, this could be used for other scenarios like Integration, Workflows, Automation etc. I am keen to hear from the community if someone has tried to implement this setup.
Intelligent Routing for SAP Cloud Integration using Azure Traffic Manager
Hi Murali Shanmugham,
Thanks for your blog, this is a very interesting topic.
But in the above scenarios, it is always a solution that does not require datastorage on BTP, but uses an onPrem system, right?
In the example above, an application is running on Azure CF Europe and Azure CF US East. If this application uses a HANA on BTP as a database, it must also be made available on both regions and the data must be replicated.
Yes, the scenario which I have described above doesn't use a persistence service. If you were to architect this with SAP HANA Cloud in the mix, you could leverage the replication capabilities as documented here. I see in the roadmap, High availability across multi-AZ for HANA Cloud is also planned for Q4/2021. Hope this helps.
Thanks for the article! I am not clear about one thing: Is the High Availability setup supposed to run 24/7, which would double operational costs, or only on days when the "Major Upgrades" take place?
Its upto you on how you could architect this. The metrics for SAP BTP services are on the usage - For example, the Launchpad service is based on number of users accessing the site. Hence, if you have both the sites running in parallel and you distribute the traffic among them, you are still charged for the total number of users accessing he Launchpad sites (irrespective of using one or two BTP subaccounts). However, as pointed by Simon above, if you leverage services like SAP HANA Cloud you might need to replicate data and that might add to the operational cost. Hence, you would need to decide if this type of architecture is important to support apps (which may be critical for the business).
This covers a very complicated case of hosting with multiple Cloud providers. SAP BTP still needs to provide a simple solution to set up and route between apps with the environment for cases like Blue\Green Deployments.
Hello Murali Shanmugham,,
Interesting blog on a complex topic,
The remark you made on the different maintenance windows depending on which Hyperscaler the BTP services are deployed called my attention. Do you refer to the maintenance windows of the BTP services or a layer below?
I assume the maintenance and upgrade windows published by SAP in the Maintenance and Major Upgrade Windows Change Log.pdf apply to all Hyperscalers? Or is it only to BTP services deployed in SAP datacenters?
Great blog. I am able to understand the architecture and options of high availability for SAP BTP.