Effective Disaster Recovery Strategies for SAP Sys...

Thaylise · ‎02-21-2024

A few years ago, I was sitting in my office sipping a coffee (Death Wish dark roast, of course) when I got a frantic call from our data center team. A massive power surge in a nearby transformer had left our SAP environment hanging by a thread.

The heart-pounding rush to initiate our disaster recovery plan was a wake-up call. It comes when you least suspect it.

Though we don’t like to think about it much, disaster can (and likely will) destroy your hard work, jeopardize key functions, and put your entire infrastructure at risk. Operate enterprise software long enough, and something will threaten its safety.

But even if Humpty Dumpty falls off the wall, we can still put him back together again.

Disaster recovery is about accepting a certain amount of data loss but reducing your RTO as much as possible (or at least to something you can live with).

Over the years, I’ve developed a few best practices that will help SAP applications get up and running after the proverbial earthquake.

Multi-Tier Backup and Restore

Alongside regular backups, I’ve maintained multi-tier backup solutions. For instance, there’s a daily incremental backup and weekly full backup strategy that maintains data both on-premises and in cloud storage for redundancy.

Automate the backup process with scripts that initiate a backup in SAP HANA using Backint and subsequently transfer the backup files to an offsite cloud location with versioning enabled.

Schedule a script as simple as:

And then transfer them using something like AWS CLI or Azure Blog Storage:

High Availability and Disaster Recovery Setup

I set up an HA and DR combination within my SAP landscape. HA in the primary site ensures minimum downtime, while the secondary site is ready to take over if it fails.

Synchronous or asynchronous replication to a failover site in a different location (in the same region) can be used, depending on the business requirements and tolerance for data loss.

On the primary system:

On the secondary system:

If their RPO is relatively high, you might be able to get away without a tertiary failover server, but if they can’t handle much data loss, it is better to have that dedicated DR host.

Database Log Shipping

Regularly shipping transaction logs to an offsite location helps recover systems to a point closer to the failure. This approach has helped me recover the last committed transaction, essentially a complete copy of the pre-failure data.

A Microsoft SQL server can continuously ship logs to a standby server, which can then be activated in case of failure.

Then, to restore it on the standby server:

Virtualization and Snapshots

I’ve used VM snapshots to enable restore points that could be quickly spun up in new hardware or a cloud environment after physical server failure. VMware’s vSphere is great for this, capturing consistent server states automatically.

It’s as simple as:

Warm Standby

Sometimes, you need to have a warm standby system regularly updated using system replications or backups. While it won’t handle production load, it’s ready to be activated and get your DR site up and running as quickly as possible.

Use LaMa to automate system replication in a reduced capacity system (compared to production) and take care of role-switching in case of disaster. These are the steps:

Install SAP LaMa on your management system.
Use LaMa to register all SAP systems in your landscape.
Set up system replication directly within LaMa’s UI, defining primary and secondary roles.
Regularly test failover to the warm standby to ensure your DR strategy is effective.

Testing can be initiated with predefined workflows, enabling you to simulate disaster scenarios and ensuring your warm standby system seamlessly takes over without significant disruptions.

Conclusion

Over the years, navigating through various IT crises, I've learned that expecting the unexpected is just part of the job.

But here's the thing, the real difference always comes down to the planning and the drills. Knowing the theory is one thing, but having run through disaster scenarios, troubleshooting on the fly, and seeing your plans hold up under pressure—that's where confidence in your DR plan really comes from.

Sharing this in a forum like ours, I hope to spark discussions on how we can all better prepare for those "Humpty Dumpty" moments. I've seen firsthand how the right approach can turn potential catastrophes into manageable incidents.

And while I hope you never have to face one, I strongly advocate for preparation. Test your setups, challenge your assumptions.

Staying prepared has allowed me and my teams to sleep a little better at night, knowing we've done our best. I'd love to hear about your experiences, your strategies, and what you've learned along the way.