Your SAP on Azure – Part 4 – High Availability for SAP HANA using System Replication
In my last post, I have guided you with provisioning of a highly available SAP Netweaver system. I have explained key topics to create a Windows Failover Cluster for the ASCS instance. This way even if one of the servers is down, your SAP remains accessible. In the today’s post, I would like to present you a solution for protecting the HANA database server.
|Did you know that you can like this post? It’s the easiest way to show your support! Just scroll up a bit and click on the big Like button.
SAP HANA database offers two solutions that are designed for High Availability:
a) Host Auto-failover – in this solution you need to deploy additional host to the current HANA database and configure it to work in standby mode. In case the active node failures, the standby host can automatically switch operations to the secondary node. This solution requires a shared storage, which we already know is a small problem for Azure
b) System replication – in this solution you need to install separate HANA system and configure replication for data changes. By default, the system replication doesn’t support High Availability as HANA database doesn’t support automatic failover. But you can use the features of SUSE Linux to enhance the base solution!
In every post, I try to present you different features of Azure. When we were building High Availability solution for SAP Netweaver I showed you how to configure all required Azure components, like Availability Set or Load Balancer, from scratch. This time we will take a shortcut. If you read my previous blog about ARM Templates, you already know that Microsoft Azure offers Quickstart Templates which simplifies the deployment of SAP environments.
There are different types of templates, designed for different purposes. In this blog, we will make the use of sap-3-tier-marketplace-image-multi-sid-db, which creates components required only by the database. As an alternative, you can use sap-3-tier-marketplace-image-converged which will deploys the entire environment (DB + ASCS + APP Server) in one step.
I prefer to work with Visual Studio, but you can deploy the template right from your browser or with the use of PowerShell.
In the Visual Studio open new Azure Resource Group project.
Now it’s just enough to click Deploy and fill the required parameters:
A few moments later I have received a nice message saying the deployment went fine and no errors were reported:
Successfully deployed template 'azuredeploy.json' to resource group 'HANA_HA'.
Let’s have a look how does it look in the Azure portal.
As you can see on above screenshot two VMs have been successfully deployed. The chosen template supports highly available scenarios, so both VMs were placed into Availability Set. The Load Balancer is initialized with backend pool and load balancing rules, so after having a quick look we can start building the solution.
HIGH AVAILABILITY CLUSTER IN SUSE LINUX ENTERPRISE SERVER
When our VMs are ready we log in and start configuration of the cluster. Firstly, we need to download additional packages for both servers:
You can do it from the command line or with the use of YaST.
We initialize the cluster on the hha-db-0 host with the command:
On the second host we execute the following command to add a host to the cluster:
Next step is to modify the corosync configuration to define the two nodes of the cluster. This step has to be executed on both servers:
To enable new settings, we need to restart the corosync service.
SAP HANA INSTALLATION
We can progress now with SAP HANA installation on two hosts.
The ARM template we have used creates a data disk that is attached to our VM. Its size depends on the SAP System Size we have selected during the deployment.
After the new partition is created we need to download HANA packages and start the install:
If you’re using the ARM templates, please use 03 as the instance number. Otherwise, you need to manually modify the load balancer rules in Azure.
During the HANA deployment, we have some time to start the database installation on the second node.
I want to use my HANA database together with SAP Netweaver, so I quickly provision new virtual machine and install SAP. I skip this step from this blog to make it easier to read, but if you’d like to learn how to provision Highly Available SAP Netweaver you can read my previous blog post.
Once the SAP Netweaver is installed we need to perform a full system backup HANA databases. This is required to enable system replication.
The setup of system replication is really easy and can be done with few mouse clicks in SAP HANA Studio. Select the first node and choose Configure System Replication from the context menu.
I recommend reading the full list of important points to consider when working with HANA System Replication and Multitenant Database Containers. For me, the most important fact is that we can enable replication for entire database only. It is not possible to enable it only for particular tenants. The state of each tenant is also synchronized, which means that the ones that are online on the primary node are also online on the secondary node. The same applies to the stopped tenants – they keep the same state on both hosts.
When replication is enabled on the primary node, we could start the configuration of the secondary node, but before we proceed we need to ensure that the SSFS PKI is the same on the primary and secondary node. Log in through SSH and copy the SSFS_<SID>.DAT and SSFS_<SID>.KEY files.
Registering a system as a secondary node can be done only when the database is offline.
Choose Register Secondary System from the Configuration and Monitoring:
There are different replication options available. Read about them here and choose the one that meets your requirements.
After few minutes, the operation is complete. You can monitor the replication status in SAP HANA Studio. You can say the systems are in sync only if replication status is active for all volumes.
The secondary system appears as operational, but you won’t be able to connect to it (you will receive information that Database Connection is not available). This is a correct behavior.
The HANA System Replication is done, so let’s go back to SLES Cluster config. In the next step, we will configure the basic cluster config by importing default values. You can decide what action should be performed when the node stops responding (stonith-action). In our case, the VM will be deallocated.
Create new file on the first node called crm-defaults.txt and enter following configuration:
property $id="cib-bootstrap-options" \ no-quorum-policy="ignore" \ stonith-enabled="true" \ stonith-action="off" \ stonith-timeout="150s" rsc_defaults $id="rsc-options" \ resource-stickiness="1000" \ migration-threshold="5000" op_defaults $id="op-options" \ timeout="600"
Now import the new configuration with the following command:
sudo crm configure load update crm-defaults.txt
The defined STONITH device will stop the system in case of failure. Therefore we need to authorize it to perform operations in the Azure subscription.
Go to Azure portal and add a new application in the Azure Active Directory:
The name and Sign-On URL are not important, just choose Web app / API as Application Type. Now, select the new app and choose Keys in the menu. Create a new entry with chosen name and select Never Expire in the second column. Remember to copy the Value after saving.
The chosen role should be Owner to allow the application to start and stop VM.
Execute this step for both VMs.
Following script configure the fencing mechanism. Please replace the bold strings with proper values from the table below.
|Name in the file||Name in Azure||Where to get?|
|Subscription ID||Subscription ID||Subscription blade|
|Resource Group||Resource Group||Virtual Machine blade|
|Tenant ID||Directory ID||Azure Active Directory blade -> Properties|
|Login ID||Application ID||Azure Active Directory blade -> App Registration|
|Password||Key Value||Can be retrieved only during key creation|
primitive rsc_st_azure_1 stonith:fence_azure_arm \ params subscriptionId="subscription ID" resourceGroup="resource group" tenantId="tenant ID" login="login ID" passwd="password" primitive rsc_st_azure_2 stonith:fence_azure_arm \ params subscriptionId="subscription ID" resourceGroup="resource group" tenantId="tenant ID" login="login ID" passwd="password" colocation col_st_azure -2000: rsc_st_azure_1:Started rsc_st_azure_2:Started
Load the configuration with the following command:
sudo crm configure load update crm-fencing.txt
It is required to execute two more scripts delivered by Microsoft to create SAP HANA resources:
SAP HANA Topology is a resource agent that monitors and analyze the HANA landscape and communicate the status between two nodes. The description of each parameter used can be checked by running man ocf_suse_SAPHanaTopology command.
primitive rsc_SAPHanaTopology_HHA_HDB03 ocf:suse:SAPHanaTopology \ operations $id="rsc_sap2_HHA_HDB03-operations" \ op monitor interval="10" timeout="600" \ op start interval="0" timeout="600" \ op stop interval="0" timeout="300" \ params SID="HHA" InstanceNumber="03" clone cln_SAPHanaTopology_HHA_HDB03 rsc_SAPHanaTopology_HHA_HDB03 \ meta is-managed="true" clone-node-max="1" target-role="Started" interleave="true"
This file defines the resources in the cluster together with the Virtual IP which is assigned to the Azure Load Balancer. You need to adjust the system id and number.
primitive rsc_SAPHana_HHA_HDB03 ocf:suse:SAPHana \ operations $id="rsc_sap_HHA_HDB03-operations" \ op start interval="0" timeout="3600" \ op stop interval="0" timeout="3600" \ op promote interval="0" timeout="3600" \ op monitor interval="60" role="Master" timeout="700" \ op monitor interval="61" role="Slave" timeout="700" \ params SID="HHA" InstanceNumber="03" PREFER_SITE_TAKEOVER="true" \ DUPLICATE_PRIMARY_TIMEOUT="7200" AUTOMATED_REGISTER="false" ms msl_SAPHana_HHA_HDB03 rsc_SAPHana_HHA_HDB03 \ meta is-managed="true" notify="true" clone-max="2" clone-node-max="1" \ target-role="Started" interleave="true" primitive rsc_ip_HHA_HDB03 ocf:heartbeat:IPaddr2 \ meta target-role="Started" is-managed="true" \ operations $id="rsc_ip_HHA_HDB03-operations" \ op monitor interval="10s" timeout="20s" \ params ip="10.0.0.4" primitive rsc_nc_HHA_HDB03 anything \ params binfile="/usr/bin/nc" cmdline_options="-l -k 62503" \ op monitor timeout=20s interval=10 depth=0 group g_ip_HHA_HDB03 rsc_ip_HHA_HDB03 rsc_nc_HHA_HDB03 colocation col_saphana_ip_HHA_HDB03 2000: g_ip_HHA_HDB03:Started \ msl_SAPHana_HHA_HDB03:Master order ord_SAPHana_HHA_HDB03 2000: cln_SAPHanaTopology_HHA_HDB03 \ msl_SAPHana_HHA_HDB03
There are various tools that assist us with cluster monitoring.
This tool shows us information about SLES Cluster, including resources and status of each node.
Displays information about the current status of SAP HANA System Replication. We are interested in sync_state column. When the replication is working fine the values should be PRIM for the primary node and SOK for the secondary.
SAP HANA Studio
General information about the system replication status. We need to ensure the replication status is ACTIVE for all volumes.
It’s time to verify our solutions. In a production environment, a proper testing of the HA solution is crucial. For the purpose of this blog, we will simulation a lost connectivity.
- The HANA operations are automatically switched to the secondary node
- The first node will shut down
- The SAP Netweaver will continue to work
- The takeover took place and the operations were continued on the secondary node
- The primary node is stopped and deallocated.
- I don’t have any good idea how to show you that the Netweaver was still running, so you have to believe me. There was a few seconds delay in operations, but it was continued without any problems!
Thanks for reading my blog! I hope you didn’t run into any issues while configuring the SAP HANA System Replication with automatic failover. See you in a short time – next blog will describe how to create your backup environment in the Microsoft Azure.
Hi Bartosz Jarkowski
Thanks for the wonderful blog series. Would interested to know how's the client end to end recovery in azure? any specific tools like ip overlay or dns routing?
that's an excellent question, unfortunately without a single answer 🙂
There are two articles in Azure documentation that touch this topic:
are any settings of .ini files replicated from source to target?
please check following posts:
is it possible a scenario with one master and two slave? I would like to set-up a sync replication master-slave on two machine on the same LAN and a second slave for the same master on a remote WAN site in async replication.
Yes, this is a quite common scenario.
Have a look here:
regard to managing the transaction logs in a replication scenario, I am missing information. I have a studio environment with two appliances. Data backup performed periodically and automatic log backup configured. When will the transaction logs from /hana/log be deleted in the primary? When will they be deleted from /hana/log on the secondary? I have seen that /hana/log continues to increase ... as I tried I performed an "alter system reclaim log" and on the primary it was emptied but on the secondary no.
enable_full_sync = false
operation_mode = logreplay
enable_log_retention = auto
logshipping_max_retention_size = 102400
logshipping_timeout = 20
Hello Bartosz Jarkowski ,
In Two Node Linux Cluster on MSFT for (A)SCS, When We Power off the Primary Cluster Node with (A)SCS services running. During this time ,No failover happend to secondary Node & due to this No services were running on secondary Node.
Could you please confirm for the Two-Node cluster to work properly both the Cluster VMs needs to be up and running ?
if you power off the primary cluster node the ASCS services should failover to the secondary node. Please note that this blog is about SAP HANA high availability, not the Central Services instance.
Hello Bartosz ,
First of all thank you for this wonderful blog .
I am new to Azure and I have one query . Please can you confirm if my below understanding is correct :
"a) To achieve high availability for SAP HANA on azure , HANA system replication option should be chosen . It is difficult to achieve host auto failover with standby node using NFS shared file system
b)Host auto failover with standby node is only possible if we use azure net app files . "
the host auto failover option is only available for SAP HANA Scale-Out scenarios. You are correct - you need Azure NetApp files for that.
More info: https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/hana-overview-high-availability-disaster-recovery#ha-and-dr
How SAP NetWeaver application will connect to Secondary hana DB? Once primary is down, we will do take-over in Secondary DB and will make it as primary node.
Then how SAP netweaver will connect or Do we have to make any changes in application server after take-over?
There is a load balancer that routes the traffic to the correct node. The failover is automatic and you don't have to make any changes to the SAP Application Server configuration.