Skip to Content
Technical Articles
Author's profile photo Susanne Janssen

HANA Scale-Out and Technical Deployment – The Case of the Standby Node

SAP HANA allows for near-endless scalability by allowing one SID to be deployed on multiple hosts, called HANA node. In HANA terms a deployment with multiple HANA nodes is considered an SAP HANA Scale-Out topology. The deployment can be physical or virtualized. The number of nodes can theoretically reach more than one hundred, even though I personally have not come across this in production.

Depending on the application, different best practices exist in terms of the number of HANA nodes, and their size. Let me describe the three most typical use cases:

  • SAP HANA as a Datamart
  • SAP BW/4HANA based on NetWeaver
  • SAP S/4HANA based on NetWeaver

For HANA as a Datamart, you can deploy any number of nodes of any size, as these systems are usually OLAP-only applications. For BW/4HANA, many vendors certified up to 16 nodes, but we observe a tendency to reduce the number of nodes and use larger nodes, like 3-6+ TB.

On the other hand, for S/4HANA, we recommend as few nodes as possible and as large nodes as possible, typically 2, sometimes 3 with 12 to 24 TB, more nodes only in exceptional cases. The reason for the latter recommendation lies in the OLTP nature of the application.

On a technical level, the scale-out for each of the applications is the same, you have one coordinator node and any number of worker nodes that each have their own data volume on storage and have an inter-node network communication.

High availability in scale-out deployments

From a technical architecture perspective, the question is often how to ensure high availability for the SAP HANA scale-out deployment. I assume everyone knows about HANA System Replication or Storage Replication (High Availability for SAP HANA | SAP Help Portal). Although the majority of implementations at customers are based on HANA System Replication, because of, for example, block level corruption check or additional features like multi-target replication. Only in rare cases storage replication makes sense. An additional solution for SAP HANA scale-out is the local high availability (HA), by configuring an empty standby node. The solution is called Host Auto-Failover.

Scale-out%20deployment%20with%20standby%20node

Scale-out deployment with standby node

In case a node fails, this standby node is attached to the data volume of the failed node and loads its data, taking over its role. In the early days especially, there were numerous issues with the memory DIMMs and system boards, so there were guidelines like: for each batch of 8 nodes, add 1 standby node. Or for each 5 nodes. However, at the time, the nodes were also quite small, with 256 GB, 512 GB and only exceptionally with 1 TB. As a basic rule of thumb, it takes about 10 minutes to load 1 TB, depending on the storage throughput capacity.

At the time, HANA system replication was barely introduced, also. Also “self-healing” VM features (like auto restart) were not available. Mostly the systems were based on bare-metal boxes with exceptionally long running reboot cycles. Therefore, the standby node was a good option. At the time.

Today, the go-to deployment for high availability is HANA System Replication. Our Hyperscalers and SAP RISE use this as only solution, for example.

If you deploy a system on premise, I recommend reconsidering the need for a standby node. It ensures against physical failures of a node, for example. If 2 nodes fail at a time, this is not ensured. Failure of the DC is not ensured.

Below please find a simple decision tree to decide whether a standby node is reasonable.

  1. If you deploy “small” nodes, e.g. with up to 2 TB and you are not using HANA System Replication (HSR). If you use HANA system replication, a standby node is definitely slower than a takeover, but it might make sense to deal with hardware failures.
  2. If you deploy large HANA nodes, like 4+TB and you have relaxed SLAs, e.g., below 99.7, a standby node could make sense, but I would recommend evaluating HSR, nonetheless.

For S/4HANA scale-out deployments, we generally recommend HSR because it is much faster and has broader protection.

Please share your experience that you have had with standby nodes, how often did you need them, what were the typical root causes, and did you have them in addition to HSR, or instead?

 

 

 

 

Assigned Tags

      1 Comment
      You must be Logged on to comment or reply to a post.
      Author's profile photo Eric Bickers
      Eric Bickers

      We used Host Auto-Failover initially but a couple years ago switched to HSR within the same datacenter. The standby nodes saved us a couple times from physical host failures in our VMware deployment. Our databases are small enough that the downtime minimization you get from HSR isn't really realized. With standby nodes we can still do rolling updates at the OS, ESXi and physical host levels so we feel it's still a very viable HA solution especially if you have multiple datacenters. 

      For us HSR is also getting a bit too redundant in the sense of how many copies of your data do you need out there? If we still replicate in the same DC plus to the secondary DC then our 1 TB turns into 3 TB. Our infrastructure guys also snapshot all of our storage volumes, even for the replicas. With backups, backups that are replicated to cloud storage, HSR and storage snapshots 1 TB multiplies very, very quickly.

      Now that we're looking at bringing up a secondary datacenter I'm very much leaning toward going back to Host Auto-Failover within the same DC. In that DC we have only 1 NetApp appliance so HSR is just different volumes from the same storage. We'll then setup asynchronous HSR to the secondary datacenter. For us our DBs are relatively small with our largest at 300 GB. We have a fair number of SAP applications deployed totaling maybe 1 TB. Most of that data (180 GB) is of the various LOB data types so preload and startup times are not really a factor since BLOBs/CLOBs are only loaded into memory on access. Even as we look at bringing attachments back to HANA rather than offloading to SharePoint or Content Server I don't think our startup times will be affected much as previously stated.