Disaster Recovery – SAP HANA
In and for any SAP Landscape the most important part is DR setup. In this particular blog, I’m discussing on different methods via which one can setup DR for HANA.
There are 3 types of DR support with respect to HANA
- Storage Replication, and
- System Replication
As we all know SAP HANA is based on in-memory technology, it persists two types of data storage:
- Transaction redo logs, and
- Data Changes in the form of Savepoints
Transaction Redo Logs -:
It is used to record all the data changes which are happening, When a database transaction is committed, the redo log buffer is saved to disk. Also, if the redo log buffer fills at any time, the redo log buffer is written to disk anyway, even if no commit has been sent. Upon an outage, the most recent consistent state of the database can be restored by replaying the changes recorded in the log, redoing completed transactions and rolling back incomplete ones.
During normal database operation, changed data is automatically saved from memory to disk at regular savepoints. By default, savepoints are set to occur every five minutes, including during a backup. Savepoints do not affect the processing of transactions. During a savepoint, transactions continue to run as normal, and new transactions can be started as normal. With a system running on properly configured hardware, the impact on performance of savepoints is negligible.
The main reason of performing savepoints is make the restart quick: when starting up the system, logs need not be processed from the beginning, but only from the last savepoint position. Savepoints are coordinated across all processes (called SAP HANA services) and instances of the database to ensure transaction consistency.
What is a SNAPSHOT?
SNAPSHOT : In a normal scenario, Savepoints overwrite older savepoints, but it is possible to freeze a savepoint for future use and this is called a snapshot.
The advantage of Snapshots is that they can be replicated in the form of full data backups, which can be used to restore a database to a specific point in time. This can be useful in the event of data corruption, Savepoints, can be saved to local storage, and the additional backups, can be additionally saved to backup storage. Local recovery from the crash uses the latest savepoint, and then replays the last logs, to recover the database without any data loss.
One drawback of backups is the potential loss of data between the time of the last backup and the time of the failure.
Therefore to take care of above mentioned limitation/drawback, a preferred solution is to provide continuous replication of all persisted data. There are several SAP HANA hardware partners who offer a storage-level replication solution, which delivers a backup of the volumes or file-system to a remote, networked storage system.
There are some vendor-specific solutions, which are certified by SAP, the SAP HANA transaction only completes when the locally persisted transaction log has been replicated remotely. This is called Synchronous Storage Replication.
Due to its continuity, storage replication can be a more attractive option than backups, as it reduces the amount of time between the last backup and a failure. Another advantage of storage replication is that it also enables a much shorter recovery time. This solution requires a reliable, high bandwidth and low latency connection between the primary site and the secondary site.
System replication is set up so that a secondary standby system is configured as an exact copy of the active primary system, with the same number of active hosts in each system.
Important point for this type of set up is that the number of standby hosts need not be identical. Each service instance of the primary SAP HANA system communicates with a counterpart in the secondary system.
The secondary system can be located near the primary system to serve as a rapid failover solution for planned downtime, or to handle storage corruption or other local faults, or, it can be installed in a remote site to be used in a disaster recovery scenario. However, if required both approaches can also be chained together with multitier system replication.
The instances in the secondary system operate in recovery mode. In this mode, all secondary system services constantly communicate with their primary counterparts, replicate and persist data and logs, and load data to memory.
NOTE –: The main difference to primary systems is that the secondary systems do not accept requests or queries.
When the secondary system is started in recovery mode, each service component establishes a connection with its counterpart, and requests a snapshot of the data in the primary system. [Snapshot has been explained above in the document] From then on, all logged changes in the primary system are replicated. Whenever logs are persisted in the primary system, they are also sent to the secondary system. A transaction in the primary system is not committed until the logs are replicated
Different Log Replication Modes-:
- Synchronous in-memory -:
It is set by default, the primary system commits the transaction after it receives a reply that the log was received by the secondary system, but before it has been persisted. The transaction delay in the primary system is shorter, because it only includes the data transmission time.
- Synchronous with full sync -:
In this option log write is successful when the log buffer has been written to the logfile of the primary and the secondary instance. In addition, when the secondary system is disconnected the primary systems suspends transaction processing until the connection to the secondary system is re-established. In this scenario there is no data loss which happens.
- Synchronous -:
The primary system does not commit a transaction until it receives confirmation that the log has been persisted in the secondary system. This mode guarantees immediate consistency between both systems, however, the transaction is delayed by the time it takes to transmit the data to and persist it in the secondary system.
- Asynchronous -:
The primary system sends redo log buffers to the secondary system asynchronously. The primary system commits a transaction when it has been written to the log file of the primary system and sent to the secondary system through the network. It does not wait for confirmation from the secondary system. However, it is more vulnerable to data loss. Data changes may be lost on takeover.