Dear STONITH, You maintain my INTEGRITY at all time. Yours, Server
As you are all aware, HANA database can only be installed on Linux server and I often see business continuity as important topic of discussion when designing solution on HANA. Business continuity is a broad term and it’s much bigger than IT as it covers all pieces of continuing the business in the aftermath of a disaster. But here in this blog I will specifically talk about High Availability of HANA database, which refers to the time required by a HANA to respond to clients’ requests and the period when a service is available.
I have already written a three series blog (Part 1, Part 2 and Part 3), which highlights how to set up High Availability for HANA database using SLES High Availability Extension but the missing part in that blog was, the details about Linux server integrity and how it is achieved. In my earlier blog, I have mentioned a step where we create STONITH on Linux cluster but details on what is STONITH, why it is used and different STONITH method will be covered in this blog.
In DBMS, to maintain consistency of data we follow ACID properties which make sure that integrity of database is maintained at all time (before and after transaction). It refers to correctness of databases. This is how database integrity is maintained, but how can we maintain integrity of severs in a cluster? How can we be sure that we don’t encounter below situation in Linux cluster and make sure the correctness of servers?
- Either your node is down and is not running the cluster resources it is responsible for.
- Your node isn’t down but the cluster resources are no longer synchronized with other nodes in the cluster.
Second situation is known as “split brain scenario”, and this may result in bad things happening to the cluster resources. Imagine, for example, a database that starts running twice in the cluster or a file system that starts to be written between two independent nodes. So, having a split brain in the cluster is bad, and the only way to ensure that no such scenario can occur in the cluster is by using the STONITH approach
STONITH (Shoot the Other Node in The Head), is a basically a fencing mechanism which powers down the selected server remotely, removing it from cluster and allowing other nodes in the cluster to take over. There are different mechanisms to implement STONITH method and implementation varies based on the deployment – On Premise or Cloud.
IMPORTANT NOTE: Without a valid STONITH method, the complete cluster is unsupported and will not work properly.
Below are few methods to implement STONITH:
Disk-based STONITH: external/sbd (On Premise – Best Practice)
Hardware-based STONITH: external/ipmi (On Premise – Second Choice)
Cloud (STONITH for cross zone availability)
Overlay IP-based STONITH: external/ec2 (AWS Cloud)
- ec2 is an I/O Fencing agent which can be used with Amazon EC2 instances.
Fencing Agent-based STONITH: fence_azure_arm (Azure Cloud)
- Used to de-allocate virtual machines and to report power state of virtual machines running in Azure
GCP STONITH: external/gcpstonith (Google Cloud)
- Google Cloud Platform host reset/poweron/poweroff/move
OpenAPI-based STONITH: fence_aliyun (Alibaba)
In this blog, I will try to explain about Disk STONITH method in details as this is basically the most adopted method for Linux cluster on HANA on Premise and little overview on hardware-based STONITH.
As mentioned earlier, we can encounter a “split brain scenario” where cluster resources are not in sync and each node in the cluster believes it is the only active cluster. To avoid this, we can configure Split Brain Detection (SBD) as node fencing mechanism to shut down the device in case of split-brain scenario. SBD provides a node fencing mechanism for Pacemaker-based clusters through the exchange of messages via shared block storage
To use this solution, you need a shared disk. So, for obvious reason this solution is much more feasible for On-Premise, as in cloud each availability zone is independent of each other so having a shared disk over SAN is not possible. As most server use SAN to provide access to data (storage), so availability of shared disk devices across nodes in the cluster won’t be a problem in On-Premise deployment.
On this shared disk, we create a small partition that is used for SBD. The size of the partition depends on the block size of the used disk (for example, 1 MB for standard SCSI disks with 512 Byte block size or 4 MB for DASD disks with 4 kB block size).
In normal situation, SBD daemon which runs on all nodes in the cluster, will monitor the shared storage. When SBD daemon loses access to storage devices, it terminates itself in case the disk become unreachable. Increased protection is offered through watchdog, where daemon continuously writes a service pulse – if the daemon stops feeding the watchdog, the hardware will enforce a system restart. This protects against failures of the SBD process itself, such as dying, or becoming stuck on an IO error. So, the pacemaker software configuration ensures a safe transition of resources in the cluster in case when node is down.
SBD STONITH is a simple but effective way to ensure the integrity of data and other nodes in a Linux cluster.
IMPORTANT: If you use sbd as the fencing mechanism, you need one or more shared drives. For
productive environments, it is recommend to have more than one sbd device.
There is also another method called IPMI (hardware-based STONITH) which can be used as fencing mechanism for On-Premise cluster. This method is traditionally implemented by hardware solutions which are installed on rack server or on management board, but it must be compatible with the IPMI standards. You need to provide remote management board access to STONITH resource agent which will trigger command and can switch off power to specific port. The success of these solutions is that they all allow the cluster to talk to the physical server withut involving the operating system (OS), because the management solutions are on a different network by default.
STONITH method On Cloud
Every cloud uses its own STONITH mechanism that is approved by SAP and SUSE and without STONITH, cluster is unsupported. Not much clarity has been provided in any document explaining the solution behind the working of STONITH in cloud.
Below are the deployment guide for HANA System Replication using pacemaker on cloud.
On Premise: HANA Scale-Up HA with System Replication & Automated Failover using SUSE HAE on SLES 12 SP 3
Amazon Web Services: SUSE Linux Enterprise Server for SAP Applications 12 SP3 for the AWS Cloud – Setup Guide
Azure: Setting up Pacemaker on SUSE Linux Enterprise Server in Azure
Solution Implementation with Screenshots
Google Cloud: SAP HANA Single-host, High-Availability Cluster on SLES Deployment Guide
Alibaba: SAP HANA Intra-Availability Zone HA Deployment (Based on SLES HAE)
STONITH method in cloud varies and I’m yet to explore solution in detail for some cloud provider and once I have clear picture I will try to write another blog explaining STONITH method for all cloud providers.
Kindly leave comment if you have any reference document/link explaining STONITH in cloud beside generic document mentioned above.
Thanks for your post. it's helpful, we adopted Ali clould to setup SAP S/4 HANA on premise.
the post of STONITH approach is good.
Question: Is there a preference in using one method or the other? Understand that disk-based would require dedicated machine with disk, but do both options handle failures equally? I had heard that ‘cloud based’ fencing was not as responsive. Any truth ?