SAP HANA High Availability Explained
Its been a while since the last blog but i decided to write a follow-up to my High Availability Explained blog to talk about the details of the high availability offering for SAP HANA 2.0.
From the S/4 (or any other product running on HANA) point of view the overall architecture remains untouched, you still have an ASCS, AS and a Database instance, the main difference is that HANA runs on a dedicated hardware meaning that systems are always distributed (CI and DB in different servers).
So, how do we make HANA highly available?
At the most basic level you need two HANA databases, ideally on separate appliances, one serving as a primary and the other as secondary… so far so good… here is where things get interesting, HANA offers a suite of replication modes for the log shipping to the secondary database, this is nothing more than the way that the logs get handed, replicated and acknowledged by the secondary database
What are the Replication Mode options?
Synchronous (SYNC), Mainly used in scenarios where the appliances are in the same data center or very close proximity, in this mode the secondary database sends the primary an acknowledgment when the log is received and written to disk, only then the primary database commits the data. there is an additional setting called Full Sync, this option is an extension to the Synchronous mode, which only considers the log write successful when the log buffer is written in both the primary and the secondary database.
Synchronous in-memory (SYNCMEM), same as above but the acknowledgement is sent to the primary when the data is received in memory, this has a significant advantage performance wise as secondary doesn’t have to wait for the data to be written to disk, as the Synchronous mode this is designed to be used on databases in close proximity.
Asynchronous (ASYNC), goes even further, the log is sent to the secondary database and does not require any acknowledgement at all before the data is committed on the primary, because of that this is the preferred mode used to replicate logs on a disaster recovery database usually located on a remote location.
You can read more details about this on SAP Help – Replication Modes for SAP HANA System Replication
Is that it?… Nope, you also have to choose your operation mode
Operation modes is simply the way that the replication operates and needs to be chosen when registering the secondary database, there is 3 operation modes to choose from:
delta_datashipping, exactly as it sounds, a delta of the data takes place in addition to the continuous log shipping, it runs and applies a delta using a defined interval (default is 10 minutes), it sends a collection of the data to the secondary system and they get applied together with the logs,, in practice this has more cons than pros, delta datashipping creates peaks on data transfer and also means that if an issue arise there will be additional recovery time as you need to replay the logs until the last delta.
logreplay, in this mode redo logs are continuously replayed on the secondary system meaning if an issue arise the secondary can take over immediate, as a bonus the stream of data transferred is steady and because it only needs one initial full data shipping the amount of data transferred is heavily reduced
logreplay_readaccess, this is exactly the same as logreplay but read access to the secondary system becomes available. This is only used in Active/Active (read enabled) secondary system which is a feature that allows the secondary database to be queried for reporting for example. (This feature requires additional licensing)
You can read more details about this on SAP Help – Operation Modes for SAP HANA System Replication
Ok, that must be it right?…. Nope, one more thing you need to select if you want pre-load ON or OFF
The pre-load option give you the chance to decide if the data replicated will be loaded into memory or not
With pre-load ON the data replicated is kept in memory meaning that in the case of an issue the failover time will be fast, with pre-load OFF the data replicated is kept on disk, meaning failover will be slow as the data will need to be loaded into memory as part of the failover. What this means is that pre-load ON does use more memory on the secondary system, and this is one of the things you need to consider if you have a small environment, for example you might want to share the secondary system additional memory to host some of the non-productive systems.
I think this is all the basic information you need to make an informed decision while designing a High Availability HANA database..
So, how does a simple High Availability + DR system running HANA database look like?
In the below example you have two nodes in HA configuration using SYNCMEM replication and logreplay with pre-load ON on the secondary system, this optimize the replication and gives you the best performance (unless you have a requirement for SYNC replication mode), additional to that you have a remote DR system using ASYNC replication and logreplay with pre-load OFF, ASYNC replication is ideal for remote replication and pre-load OFF will let you maximize the memory available on the DR box so it can be used for non-prod systems
HA Cluster and automatic failover
SAP does not provide an automatic failover mechanism for HANA, that means you need to engage one of the standard HA Cluster third party solutions to provide automatic failover in case of loosing one of the HANA boxes.
SAP does provide a tool called Host Auto-Failover which can be used in addition or as an alternative to replication, now having said that the host auto failover only replaces the affected box with a standby host but does not failover between active instances, this tool is mainly used in scale-out implementations as SAP can have multiple HANA appliances and an standby host can really help avoid issues with the system availability.
Hope this gives you a better understanding of the HANA High Availability options, for mode details on how a cluster works check my previous blog High Availability Explained
Love to hear your commends,
PS, the information shared on this blog is valid up to HANA 2.0 SP5
We've had suite on HANA on HA for 4.5 years now and it has been a learning opportunity for us. It took us a while to smooth out the kinks but after the first 6 months or so it has run very smooth. Recently we enhanced our configuration to sync like this.
When replication is used in operational mode log_replay, even if preload_column_tables is disabled, the secondary indexserver can consume a lot of memory (hundreds of gigabytes and even more than a terabyte) in the Pool/ColumnStore allocator. And that's by design (according to HANA Development Support). And this can lead to crashes of the index server when combining replicas and an nonproductive instance on the same host. The only way in this situation is to use delta_datashipping operation mode, but it is not compatible with multi-target design and leed to use only multi-site in case of multiple replicas
That is a good point, in theory this is doable and even recommended, in practice this is not as easy as it sounds.
This is good point
Thank you for the article.
I setup SAP HANA 2.0.48 active/active read-enable for a HANA DataMart, however later I found the secondary read-only node cannot use SDA, when talk with a S/4 HANA (OLTP) database. As last reply in Feb 2021, SAP said they are still working on it. Hopefully, in the close future, it can be available.
Regarding the license fee of the secondary read-only node, are you sure SAP will charge additionally?
That is what the documentation suggest.
For more info the best is to contact your SAP account manager.