SAP HANA HA and DR Series #2: Redundancy and Fault Recovery Support
This is the second article of my SAP HANA High Availability and Disaster Recovery series I started last week (first one is here). In this one, I will cover the redundancy and fault recovery (FR) options available within the HANA platform.
One of the key elements of architecting SAP HANA platform is to design a reliable fault tolerance in your landscape and the first step to achieve this is to eliminate single points of failure (SPoF). Almost all HANA appliance vendors offer variety of redundancy options to avoid unexpected downtime due to hardware, network, and data center failures (SPOF).
Hardware redundancies include (but not limited to) hot swappable PSUs, network interface cards and military grade memories. For network redundancy; redundant network equipment (switch/routers), topology and network protocols are available. Data centers that host HANA servers can be equipped with uninterrupted power supply units, backup power generators and redundant cooling systems to eliminate SPOF.
SAP HANA platform also provides additional defence mechanisms on separate layers to avoid unexpected outages and recovery as quickly as possible in case of a failure.
Watchdog function is a fault recovery option on the software layer and restarts the services in case of a service failure or a manual intervention that disables one of the configured SAP HANA services (indexserver, nameserver etc). Service auto-restart watchdog function automatically detects the failure and attempt to restart the stopped process(es) immediately. Once the restart is successful, the service data is loaded into memory and continue functioning as normal. The data consistency remains safe and recovery takes a little time due to service restart and memory load process.
You can configure all watchdog settings in nameserver.ini, default parameter settings should also be fine though.
Logging of the transactions is managed on the persistence layer. In this area, SAP HANA persists transaction (redo) logs, (storage) snapshots, and savepoints to provide the system restart and recovery from host failures.
Transaction logs are used to record a change and to make a transaction durable persisting to complete data when the transaction is committed is not necessary (thanks to transaction redo log technology), instead it is sufficient to persist its redo log.
Storage snapshots offer an additional option to protect the SAP HANA data blocks and to recover the database. The are two main benefits of snapshots; first they can be created with minimal impact on the database, and the recovery from a snapshot is faster than a recovery from a backup.
Each SAP HANA service has its own separate savepoints and they are used when HANA persists the in-memory data which means during a savepoint operation, all changed data from the memory is saved in the data volumes. Savepoint intervals can be managed in global.ini:
At the end of the day (not literally), any changes to data in the database and logs (containing data changes) and certain transaction events are saved regularly to disk. So basically, persistent layer ensures that HANA database can be restored to the most recent committed state after a restart.
The third category for SAP HANA Fault Recovery is standby and failover capabilities where separate, dedicated standby hosts are used for failover to improve system availability to achieve HA.
Host-Auto Failover and System Replication are two main options when it comes to high availability in SAP HANA, and both have their key benefits and some drawbacks compared to each other. I plan to deep dive into these two HA options in the upcoming article, especially with the new features available with the recent HANA releases.
Host Auto-Failover and System Replication are obviously much more exciting subjects so stay tuned 🙂
Do you have any question about redundancy and fault recovery support of SAP HANA? Leave a comment below, I would love to help you and learn from you as much as I can!
Feel free to share!
References and further reading:
If you liked this post, you might like these relevant posts: