How to Plan for Disaster Recovery with SAP HANA
As devices, systems, and networks become more complex, there are simply more things that can go wrong due either to man-made or natural disasters. See the options available with SAP HANA for disaster recovery that help organizations keep business running smoothly in such circumstances.
SAP HANA provides a single platform to extract and analyze massive amounts of structured and unstructured data in real time from multiple sources such as social media, blogs, online reviews, emails, and discussion forums. The analyzed information helps customer to answer specific questions, increase revenue, and make accurate and timely decisions. The analyzed information and the data that is accumulated over the years is the backbone for an organization. If something happens to the data due to natural or man-made disasters, business can come to a halt. Making use of the SAP HANA disaster recovery features can save companies from such an outcome.
By reading the article you will be able to:
- Understand the concept of disaster recovery
- Understand the disaster recovery options with SAP HANA
Nowadays most every company has its business continuity plan that helps during an unfortunate event such as flood or earthquake. A disaster recovery plan is part of business continuity plan focusing mainly on the restoration of IT infrastructure and operations after a crisis.
There are two important terms when it comes to a disaster recovery plan: recovery time objective (RTO) and recovery point objective (RPO), as shown in Figure 1.
Figure 1 Disaster recovery concepts
RTO is the target time in the future to get your application back online and running after a disaster has struck. The goal here is to quickly calculate how much time is required to recover. The cost of the disaster recovery option varies with the time. The lower the time to recover the higher the cost. For example, if your organization’s RTO is three weeks then you may able to invest in a less expensive recovery option, whereas if it is four hours then you need a higher budget and high level of preparation.
RPO is the target time in the past from which the system will be restored. The goal here is to determine the time between data backups and the amount of data that could be lost in between backups during a disaster event. The data backup time depends on how long your organization can afford to operate without data before the disaster happens. For example, if your organization can survive with two days of lost data, then data backup RPO will be two days.
The cost analysis according to the RTO and RPO along with SAP HANA disaster recovery features is shown in Figure 2. The lower the RTO AND RPO objectives, the higher the cost.
Figure2 – RPO & RTO of Disaster Recovery Features
SAP HANA offers three features for disaster recovery:
- Backups and recovery
- Storage replication
- System replication
Backups and Recovery
Data backup in SAP HANA is written into disk from memory. It can only be performed when the database is online. SAP HANA supports following backup methods (Figure 3).
- Data backup (savepoint). The data backup process is asynchronous. The SQL data and the undo log are saved to storage to ensure a speedy restart. You can customize the savepoint time to five minutes.
- Log backup (redo log). It is used to record a change in data and is performed synchronously. The data is saved to the persistent storage as a database transaction
is committed. The reason for saving the logs is that when eithera power failure of any other disaster happens, the log can be executed again to bring the database to the most consistent state.
Figure3 – Data and log backup
With SAP Hana SPS10 there are two new data backup options (Figure 4)
- Incremental Backup. These are the smallest delta data backup as only the changed data is backed up at frequent intervals after the last full or delta backup. As the data backup is at frequent intervals so the data backup size is small, takes less time and occurs at fast speed. They have high RTO as they need to be restored one after another in the sequence.
- Differential Backup – It is the delta data backup which is done after the full data backup. It has more data size then incremental backup and takes more time. It has low RTO as compared to incremental backup as the number of backup to be restored is less.
Figure4: Incremental and Differential Backup
The data backup can be performed either from the SAP HANA studio, database administration (DBA) Cockpit, or by executing SQL commands via HDBSQL. SAP HANA HDBSQL is a command line tool for executing commands on SAP HANA databases. The following authorizations are required to perform data backup in SAP HANA:
- Backup admin
- Catalog read
The savepoint and log play an important role in the recovery of the system during a disaster(Figure 5). The database is first restarted and then the last savepoint is reloaded. There can be a huge data gap between the last savepoint and the disaster point (Figure 5). The gap between the points are fulfilled by re-executing the incremental or differential backup and then the redo log which contains the log of recent executed transaction. Once the transaction is re-executed the database is back to its consistent state. The uncommitted transactions are rolled back using an undo log whereas the committed transactions present in the redo log are executed.
Figure5 – Savepoint and redo log
SAP has also provided an application programming interface (API) to support data backup and recovery using third-party tools. Some of the third-party tools are as follows
- Symantec NetBackup,
- IBM TSM for Enterprise
- Commvault Simpana
- HP Data Protector
The backup and recovery is more successful in the case of power failure or disk failure. This does not help when the persistent storage is itself destroyed or some logical error has occurred.The cost of backup and recovery is not huge as compared to other options. The backup and recovery is used by company that does not have the need for a short RTO and RPO = 0.
Storage replication is the process of mirroring disk content to a secondary data center with a standby SAP HANA system (Figure 6). The transfer process can be either synchronous or asynchronous depending on the distance between the primary and the standby SAP HANA system. As the distance between the primary and secondary center increases the latency time for writing the log also increases. The further the distance, the higher the time is to save the data between the centers. This reduces performance. The synchronous transfer is therefore used for shorter distances whereas the asynchronous method is for longer distances. Synchronous data replication between the primary and secondary site ensures zero data loss (RPO=0). This allows the protection of a data center against events such as power outages, fire, floods, or hurricanes.
The asynchronous storage replication can also be used but it is possible that during a takeover the changes that were made last may be lost. In some application scenarios, this loss can be accepted, in others not. SAP suggests you use synchronous replication as it gives more performance and even the slightest chance of data loss is removed. Due to continuous replication it offers a better RPO than backup but it requires a high bandwidth and low latency connection between the primary and the secondary site. This can be mainly used for recovery from local storage corruption or recovery after a disaster.
Figure6 – Storage Replication
SAP HANA system replication ships all data to a secondary system located at another site. Once SAP HANA system replication is enabled, each server process on the secondary system establishes a connection with its primary counterpart and requests a snapshot of the data (Figure 7). Further all logged changes in the primary system are replicated continuously to secondary system. Each persisted. Transaction persistence to disk log in the primary system is sent to the secondary system. A transaction in the primary system is not committed before the logs are replicated.There are different options available:
- Synchronous: The primary system does not commit the transaction until the secondary system sends an acknowledgement to the primary system as soon as data is received and persisted.
- Synchronous in memory: The primary system does not commit the transaction until the secondary system sends an acknowledgement to the primary as soon as data is received.
- Asynchronous: As per the design of asynchronous replication, the primary does not wait until the secondary sends an acknowledgement.
- Synchronous full sync – The synchronous option is executed with a full sync option. In a full sync operation, transaction processing on the primary site is blocked. No transaction will be committed on the primary server before being committed on the secondary server when the secondary site is currently not connected, and newly created log buffers cannot be shipped to the secondary site. This behavior ensures that no transaction can be locally committed without shipping to the secondary site.
SAP HANA system replication has less RTO and is faster than storage replication. SAP suggests you use synchronous replication as there can be a data loss during asynchronous replication.
Figure7 – System replication
The main benefits of system replication are as follows
- The secondary system can be used during planned downtime of the primary system
- The secondary system can be used during a software fault in the primary system
- The secondary system can be used during a disaster
- The secondary system can be used during a crash of the primary system
The log entries in the secondary system are executed continuously, immediately after they have been received. This means that the secondary system can take over with virtually no delay, if the primary system fails. This replication solution offers a low RPO and RTO to the customers
OK, this could be considered a short synopsis of the standard documentation. But why doesn't it provide any additional information?
Why not include practical experiences or tips to use the described ideas about the whole backup & recovery topic in real life?
Also attributing the source of the diagrams would be nice and fair to the original authors.
This is not a particularly bad article, but with only a little effort you could greatly improve it. Why don't you?