Disaster Recovery for SAP Adaptive Server Enterprise
Disaster Recovery (DR) is the process, policies, and procedures used to recover data after a natural or human-induced disaster. Disaster Recovery differs from High Availability (HA), which aims to keep all aspects of a business functioning during disruptive events. SAP replication achieves DR through asynchronous replication to a geographically separate site.
To support Disaster Recovery, SAP ASE 16.0 SP03 PL02 lets you set up, manage, and monitor an optional, additional DR site that is geographically separate from the 2-node HA system. This third site is dedicated to disaster recovery and doesn’t need synchronous replication.
HADR System with DR Node Topology
Here’s an example of an HADR with DR node system, where London1 is the primary node, London2 is the standby node, and Paris is the DR node:
The HADR with DR node system consists of three SAP ASE servers:
- One designated as the primary, on which all transaction processing takes place.
- A standby node, which acts as a warm standby for the primary node, and contains copies of designated databases from the primary node.
- The DR node, which backs up designated databases from the primary node on a geographically distant server.
The HADR with DR node system includes an embedded SAP Replication Server that synchronizes the databases among three servers. The system uses Replication Management Agent (RMA) to perform the initial setup and HADR operations.
The HADR with DR node system supports asynchronous replication between the HADR cluster and the DR server for disaster recovery. The primary and DR servers ( or the standby and DR servers) using asynchronous replication can be geographically distant, so the network link to the remote DR server may be slower than the links between other nodes. With asynchronous replication, the Replication Agent thread for SAP ASE captures the workloads of the primary or the standby server and delivers them asynchronously to SAP Replication Server. The SAP Replication Server applies these workload changes to the DR server.
Data Replication Flow
When a user application commits transactions on the primary server, primary SAP ASE uses synchronous replication mode to send the transaction logs to the standby SAP Replication Server. Standby SAP Replication Server applies the transactions in the standby SAP ASE. At the same time, standby SAP Replication Server transmits another copy of the transaction logs to SAP Replication Server on the DR node using asynchronous replication mode. Finally, SAP replication server on the DR node applies the transactions to SAP ASE on the DR node. Here’s a diagram showing transaction replication flow.
Here are some typical use cases for disaster recovery:
Primary Server Down
When your application can’t reach the primary SAP ASE server or the whole primary host, Fault manager automatically triggers an unplanned failover operation in RMA. RMA switches the standby host to the new primary host. After a short break, the user application continues running on the new primary node. Replication to the DR host is not affected if the primary server or host is down. After the failover, data generated on the new primary ASE server can also be replicated to the DR host. After the previous primary ASE or host is recovered, you can trigger a planned failover to switch back to the previous primary host.
HADR System Down
If the HADR cluster is out of sync due to some fatal errors or disasters, you can recover the HADR cluster with the databases on the DR node. To recover the HADR cluster, load each database from the DR node to the primary node, then rematerialize each database on the standby node and the DR node.
Topic reference: Materializing and Rematerializing Databases
Standby Server Down
If the standby ASE or RS server is down, recover the servers directly. In extreme circumstances, if you cannot recover the servers, you may have to discard them and reconfigure the environment. For assistance, see Tips and Tricks (below) or contact SAP support.
DR Server Down
If DR has not been available for a long time, rematerialize DR to synchronize pending data instead of continuing replication.
Perform planned failover for system maintenance or system rolling upgrade.
Tips and Tricks
- There are two ways to materialize databases. For small databases, use the automatic method to materialize the database. For large databases, use the manual method to materialize.
Topic reference: Materializing and Rematerializing Databases
- The 3-node environment supports SSL (Secure Sockets Layer).
- User databases created by the create database command are not automatically added to the HADR participating database list. To add the databases,
- Create the new database on the primary and companion servers.
- Make sure the databases use appropriate sizing for the data and log devices. For example, if you create the pubs2 database on the primary server, create it on the companion server as well.
- Create DR_admin and DR_maint users on the databases.
- Grant necessary permissions to these users.
- Issue the sap_update_replication command.
- During unplanned failover, if SAP Replication Server on the primary host is down, data replication from the standby host to the DR node cannot be restored until SAP Replication Server on the primary host comes back to work, which affects the availability of the replication to the DR node. Under this situation, the HADR pair topology switches from remote to local to avoid the data flow through SAP Replication Server on the primary host.
- If you need to discard the whole system, perform the teardown operation. If you want to discard only the DR node, use the sap_update_replication command.
Topic reference: Removing the DR Node from the HADR System
- With rolling upgrade, you can update applications in a DR node system with zero downtime.
Topic reference: Performing a Rolling Upgrade on a DR Node System
The HADR system offers several types of monitoring:
- Use sap_status path or sap_status active path to check if the system is working correctly.
- When issues are detected in the working system, use sap_status resource to check the resources that are being used by different models. If one model occupies the extreme amount of resource compared with others, this model might have some problems and should be further investigated.
This blog only talks about the technologies used between the DR node and the HADR system, if you want to know more about the HADR system itself, see HADR Users Guide on the Help Portal.