SAP NETWEAVER High Availability using SUSE HA extension in AWS – overview
SUSE cluster solution is proven over the period of time and used in the market by many customers. This blog post provides an overview of the cluster setup in AWS with insights about the technical steps involved. Note that this blog post is not the official guide for this setup. The technical steps required for this setup are well explained in the official SUSE guide link.
Note: Before going into the cluster setup for SAP systems, prior knowledge of SAP Netweaver and HANA installation is a must. Also experience with AWS native features is recommended(EC2 instance, IAM role, Security groups, Tags, EFS and EBS storage, Availability zones, VPC, route tables etc.). Also from there is a special feature AWS Overlay IP which is explained below.
How AWS Overlay IP works?
AWS networking allows creating routing entries for routing tables, which direct all traffic for an IP address to an EC2 instance. This concept allows directing the traffic to any instance in a Virtual Private Network (VPC) no matter which subnet it is in and no matter which availability zone (AZ) it is in. Changing this routing entry for the subnets in a given VPC allows redirecting traffic when needed. This concept is known as “IP Overlay” routing in AWS
- Identify two availability zones to be used, AZ1 & AZ2.
- Create subnets in the two AZs which can host the two nodes of a SLES HAE cluster.
- Use a routing table which is attached to the two subnets. Both subnets are a part of the same customer VPC.
- Overlay IP is created outside the CIDR of subnets but inside the customer VPC.
- Overlay IP is mapped to the network interface of node1 initially in the Route table.
- Pacemaker uses “aws-vpc-move-ip” to start the IP movement from on instance to the other.
Reference: IP failover with Overlay IP Address.
The cluster setup for NW 7.40 & HANA which is described in the guide can be divided into overall 3 sections and the important steps are shown below.
I. Pre-requisites for cluster setup
Below list gives a summary of the task which we need to execute.(detailed steps provided in the official guides) Step 1,4,5,6,7 are specific to AWS.
- Setup AWS VPC in such a way that covers at least 2 availability zones from AWS, also the subnets for each availability zone should be created in such a way that they are reachable using a single route table.
- Deploy VM and create file system/users etc. prepare the VMs for SAP and HANA installations, all basic requirements for SAP NW and HANA needs to be met.
- EFS -> For all share files system /sapmnt/<SID>; /usr/sap/trans; /usr/sap/<SID>; /hana/shared;
- EBS -> data and log volumes of HANA.
- Install SAP ASCS/ERS/CI instances along with the Primary and secondary HANA DB using SWPM.
- Make sure that the HANA system replication and the Enqueue replication is working and failover and fail back works fine before proceeding with the cluster setup.
- Create and add the AWS IAM policy to the EC2 instance as per section 3 in the guide. This policy would later enable the SUSE pacemaker to allow AWS API calls to make changes route table in the AWS route table.
- Assign the security groups to the EC2 instance and maintain tags for EC2 instance.
- Create and entry in the AWS Route table for the Overlay IP.
- Ideally 3 entries for ASCS|ERS and Database is required.
- Configure the AWS profile and assign tags to the EC2 instance as per guide.
- Check if the AWS URL is accessible from the EC2 instance. If this does not work then possibly the proxy configuration is wrong.
II. Cluster & Cluster resource configuration for NW7.40 & HANA
Following links provide a good starting point for OS cluster setup clusters from scratch
- Install pacemaker and configure basic cluster. i.e. without any resources.
- configure Stonith
- Configure IP resource for ASCS ERS and HANA.
- configure ASCS & ERS instance resource.
- *optionally you can configure col-location constraint
- configure HANA instance resource.
There are various options OR settings which can be done for these resources which are configured in the cluster, however the one provided in the SUSE official guides are tested and works pretty well in normal scenario.
It is good practice to group the cluster resource.
Make sure you follow the steps in order as provided in official guides once done take a backup these initial configuration files.
Sample configuration: Once setup as the guides mentioned in this blog post the final configuration looks similar to the picture below.
* In the above scenario the file system resource has not been included as the share file systems on EFS can be pre-mounted on multiple hosts at once.
III. Post Steps and Checks.
- Check the status of the cluster.
- Check if proper alert status is reflected in the monitoring for the cluster. (Monitoring can be user defined.)
- For HANA use the python utility SystemReplicationStatus.py to make sure replication works fine. Note here all the services for HANA should be in ‘SYNC’ state otherwise the HANA failover to secondary does not work.
- Ensure at least one Database and FS backups are successful.
Over all architecture
Cluster setup from SUSE in AWS for a typical NW 7.40 & HANA system looks similar to the below diagram.
The end user connect to the message server OR the HANA DB using the Overlay IP which is then also maintained in the customer specific DNS. Since the overlay IP is outside the customer CIDR range therefore an additional configuration is required to route the traffic via the Overlay IP. Possible options are described in the SUSE guide.(Section 3.1)
In an event when the ASCS VM goes down OR is not accessible OR the SUSE pacemaker fails over the ASCS instance to the ERS VM which is running in the availability zone B.
- The Overlay IP is removed from the ASCS VM and assigned to the ERS VM, also at this time the AWS route table is changed to reflect the new mapping for the Overlay IP.
- ASCS instance is started by the SUSE pacemaker on the ERS VM where it reads the replicated lock table entries from the shared memory and build the news lock table, hence the on-going transactional data is not lost. Once the ASCS instance is started we can actually configure the ERS to shut down or continue running on the same VM.
In scenarios where the Primary HANA DB goes down due to any reason, the secondary HANA DB takes over and gets promoted as the Primary DB automatically by the SUSE Pacemaker. There are two additional resource agents provided by SUSE.
- SAPHana Resource Agent – This agent runs on all nodes of a SUSE Linux Enterprise High-Availability Extension cluster and gathers information about the status and configurations of SAP HANA system replications.
- SAPHANATopology Resource Agent – This resource agent from SUSE supports scale-up scenarios by checking the SAP HANA database instances for whether a takeover needs to happen.
Setup for HANA on SUSE Pacemaker cluster and resource configuration is well explained in the guide. There are some test cases provided by SUSE which must be validated as a part of the overall setup.
*Co-location constraint: SUSE pacemaker also allows us to set a colocation constraint for the ASCS<->ERS pair. E.g. in a situation when the ASCS and ERS are running on the same VM after the failover there is a possibility of the VM in availability zone A to come back after a failure OR possibly after an AWS auto-recovery event. What happens in such a scenario?
The cluster pushes the ERS instance out and start it on the VM running on the availability zone A. This is an really interesting feature where once the additional node is available the cluster can make sure that the enqueue locks are replicated on another VM thus safeguarding the lock table. Below picture explains this scenario with co-location constraint set.
Note how the Overlay IPs keep changing and always point to the running\active ASCS and ERS instance.
Setting up a SUSE cluster in AWS infrastructure has added advantage as compared to on-premise. The infrastructure layer challenges around storage and IP movement are taken care by AWS itself i.e. EFS offers an NFS based storage which can be mounted across the availability zones on more than one node at once. AWS provides API based approach for making changes to the IP in the Route tables.
The overall setup time is less and there are pre-defined configuration for SAP Netweaver and SAP HANA from SUSE which can be directly consumed.