Your SAP on Azure – Part 7 – Protect SAP landscape with Azure Site Recovery
How quickly are you able to rebuild your SAP landscape in case of data center failure? In case your disaster recovery plan is limited to performing daily backups, then just the process of provisioning new hardware (or VMs) can already take a long time – and you still need to move the data, install the software and finally restore the backup. On the other hand, having a cold standby site with ongoing database replication may be too expensive, especially for smaller organizations. But is it still a true? Does a secondary site have to be expensive?
I help customers build their SAP on Azure landscapes and very often we discuss the easiest way to get started. I always recommend improving the Disaster Recovery process by implementing Azure Site Recovery, which is a Microsoft’s cloud service that replicates on-premise servers and creates a recovery plan to provision resources in the cloud in case of an unexpected event. During normal operations you have to pay only for the storage – target virtual machines are created during the failover process. Reliable and inexpensive disaster recovery solution? It’s today’s reality.
Did you know that you can like this post? It’s the easiest way to show your support! Just scroll up a bit and click on the big Like button. Thanks! |
The disaster recovery process can be highly customized and adopted in various ways. The network infrastructure, recovery point and time objectives and even a physical user location these are all important factors when designing the recovery process. There is no a single solution that fit all companies, therefore when writing this post, I focused on small and medium organizations with a limited budget.
Azure Site Recovery protects your workload by replication of the disks. The process is compatible with SAP NetWeaver products and it’s officially supported by Microsoft. But protecting a database is slightly more difficult. Due to the way ASR performs the replication it can’t ensure the data consistency between the data and log areas. The recommended solution is to create a System Replication between the on-premise and cloud instances. It will ensure the lowest RPO and RTO, but it requires a constantly running server in the cloud environment. In my post, when it comes to protecting the HANA instance, I present an alternative solution based on the automatic shipping of data backups to Azure Blob storage. It will require some actions to be performed during failover, but the total cost of the solution is much lower – you pay only for space you are using.
PREPARATION
When you decide to use Microsoft Azure as your disaster recovery site you need to consider different connectivity options. Depending on the number of users and required bandwidth you can choose between a secured VPN or dedicated MPLS connection called ExpressRoute. My recommendation is to always use ExpressRoute for production workloads.
My local on-premise SAP HANA database and SAP NetWeaver are running on separate virtual servers in Hyper-V environment, but VMWare landscapes are also supported. During the failover, Azure Site Recovery provisions virtual machines in the selected subnet of the remote network. The newly created VMs preserve its hostnames, but the IP address will be different. There are different ways to redirect traffic to the secondary site – I use a simple method where I change the DNS records to point users to new servers.
I start the configuration of Azure Site Recovery by creating few initial resources:
– Target resource group when VMs will be created during failover (SAPRecoveryVM)
– Storage account to store the VM replicas (bjvmasr)
– Storage account for VMs and database backups (bjvmasrstorage)
– Recovery Services Vault (SAPRecovery)
The first part of configuration focuses mostly on enabling connectivity between the local data center and Azure cloud. To enable the replication it is required to define logical Hyper-V site and register Hyper-V hosts that run the VMs.
Go to the previously created Recovery Services Vault and select Add Hyper-V Site under Site Recovery Infrastructure:
When the Hyper-V Site is created we need to register Hyper-V hosts which runs virtual machines.
The host registration process involves downloading the Azure Site Recovery Provider and importing automatically generated Recovery Vault configuration during the software installation.
A few moments later the information about registered servers appears in Azure:
Replication policy is a set of rules that determines how your servers will be replicated to Azure. I accept the default values.
There are two recovery services models available in Azure:
Azure Backup Vault – backup solution similar to the traditional method of copying the configuration and the data to external drives or tapes. It protects the files, however, it doesn’t backup the entire server.
Site Recovery Services – new service offered by Microsoft to protect entire servers. It creates images of the disk drives and mirrors them to Azure Cloud Platform. As the backups contain the entire operating system and all application binaries, the recovery process is quicker – there is no need to reinstall the software.
Go to Site Recovery, select Prepare Infrastructure and follow the guided configuration.
Specify the storage account and virtual network that should be used for Azure Site Recovery.
In step 5 associate the previously created replication policy with Hyper-V site.
REPLICATION – SAP NETWEAVER
The initial configuration of Azure Site Recovery is completed and we can now setup the replication of the SAP NetWeaver server.
In SAPRecovery vault, enter Replicated Items and select +Replicate to add a new server. Choose the previously defined Hyper-V site and decide about the target settings for recovery.
In the third step we can choose VMs we’d like to protect. On the next screen choose the operating system and decide which disks should be replicated.
Verify and save your settings.
The VM is now visible under Replicated Items. Depending on the disk size and the replication policy it may take a couple of hours until your VM become fully protected.
During the guided configuration we could select basic settings for our virtual machines but it is also possible to enter the advanced mode and customize the target size or the use of managed disks.
Click on the VM name and choose Compute and Network from the right-hand menu.
REPLICATION – SAP HANA
Before we configure the replication for SAP HANA database I need to explain the approach in more details. Microsoft recommended solution to protect database workloads is to enable the replication at the database level as it ensures the lowest RPO and RTO. The approach presented in this blog is different – we are going to use Azure Site Recovery to protect virtual machines including the operating system and SAP HANA binaries, but during the failover, it will be required to manually restore the database.
Whenever you are migrating a Linux system I recommend modifying the fstab entries. It’s quite common that the disks UUID is going to change and you may encounter problems with mounting partitions, especially during the testing phase. Adding the nofail option prevents entering emergency mode due to disk configuration:
During my tests, I found one more problem. When I started the replicated Linux VM on Azure for the first time the system entered emergency mode and throw the error message:
blk_update_request I/O error, dev fd0, sector 0
I’m not sure whether this is a bug or maybe some specific configuration of my operating system, but I found out that kernel module responsible for a floppy drive is enabled, which caused the above error.
The solution is to disable the floppy module to start automatically:
sudo rmmod floppy
echo "blacklist floppy" >> /etc/modprobe.d/50-blacklist.conf
mkinitrd
reboot
To ensure the lowest recovery point the up-to-date SAP HANA database backup has to be available in the cloud storage. In the previously created storage account, I created a new container where all database data and log backups will be uploaded to.
For the data upload, I recommend using the Microsoft AzCopy tool.
wget -O azcopy.tar.gz https://aka.ms/downloadazcopylinux64
tar -xf azcopy.tar.gz
sudo ./install.sh
I created two scripts – one to upload a daily full backup and one to upload log backups.
Full backups (FullBackupSend.sh):
azcopy \
--source /backup/data \
--destination https://bjvmasrstorage.blob.core.windows.net/hanabackup/data/ \
--dest-key <access_key> \
--recursive \
--exclude-older
Log backups (LogBackupSend.sh):
azcopy \
--source /backup/log \
--destination https://bjvmasrstorage.blob.core.windows.net/hanabackup/log/ \
--dest-key <access_key> \
--recursive \
--exclude-older
I set up two jobs in the crontab to upload files to Azure Blob:
Log files: every 15 minutes
Full backups: once a day at 02:00 am
crontab -e
*/15 * * * * /hana/shared/scripts/LogBackupSend.sh > /hana/shared/scripts/LogBackupSend.log
0 2 * * * /hana/shared/scripts/FullBackupSend.sh > /hana/shared/scripts/FullBackupSend.log
You can verify the files are uploaded to cloud storage in the Azure Storage Explorer:
I like to be prepared, so I also wrote a script which I can use to download all files in case of failover.
Download files (DownloadAll.sh):
azcopy \
--source https://bjvmasrstorage.blob.core.windows.net/hanabackup/data/ \
--destination /backup \
--source-key <access_key> \
--recursive
Now we can configure the virtual machine replication in Azure Site Recovery. Follow the same steps as for the NW Application Server – the only difference is that I don’t upload data disks with database persistence and backup files.
After a few minutes you can see the protection and synchronization status:
CREATE RECOVERY PLAN
A great Azure Site Recovery feature is the possibility of creating restore procedures. Instead of performing a recovery for each VM individually we can define a Recovery Plan that will orchestrate the entire process. You can use Azure Automation, which is a powerful automation engine, to define additional activities to be performed during failover. I’m using it to create and attach two data disks to the HANA VM.
Create Azure Automation account, select Runbooks and create a new item.
I chose PowerShell as my Runbook type. I wrote a small procedure to define and attach data disks.
$VMName = BJ-HANA
$RGName = SAPRecoveryVM
$VM = Get-AzureRmVm -Name $VMName -ResourceGroupName $VMName
$Lun1 = $VM.DataDiskNames.Count + 1
$Lun2 = $VM.DataDiskNames.Count + 2
$VM = Add-AzureRmVMDataDisk -VM $VM -Name AD_DataDisk1 -CreateOption Empty -DiskSizeInGB 60 -Lun $Lun1
$VM = Add-AzureRmVMDataDisk -VM $VM -Name AD_DataDisk2 -CreateOption Empty -DiskSizeInGB 20 -Lun $Lun2
Update-AzureRmVM -VM $VM -ResourceGroupName $VMName
In order to create the recovery plan go to Site Recovery – Manage Recovery Plans and select New Recovery Plan. Choose the VMs to be included and confirm the selection.
A simple Recovery Plan will just restore virtual machines at the same time. To attach additional actions click Customize button.
Insert SAPRecoveryAttachDataDisk runbook as a post start action.
FAILOVER
The disaster recovery configuration is completed and both VMs are successfully replicating to the cloud platform. Let’s verify our solution by executing Test Failover.
A test failover is a Site Recovery feature that lets you verify your solution without any impact to ongoing replication. It creates the VMs and follows the Recovery Plan, but it lets you use a separated network for deployment. Such VMs can be isolated and won’t affect production environment. Isn’t it a great feature to verify your Disaster Recovery strategy?
The failover job took around 10 minutes. Let’s have a closer look at each step. In SAPRecoveryVM resource group I can see both SAP NetWeaver and SAP HANA virtual machines.
The execution of the SAPRecoveryAddDataDisks was successful:
And the two new disks are attached to SAP HANA virtual machine:
So far everything according to plan! Let’s log in to the virtual machines.
POSTPROCESSING – SAP HANA
SAP NetWeaver won’t start without a database, so our initial actions will focus on getting the SAP HANA to work. Log in to the virtual machine and initialize the attached disks. Your /hana/shared directory should be already mounted correctly. Recreate the missing data and log directories.
Download the backup files using the prepared script:
As the SYSTEMDB database is unavailable I’m using recoverSys.py script to restore it.:
HDBSettings.sh recoverSys.py --command="RECOVER DATA USING FILE ('COMPLETE_DATA_BACKUP') CLEAR LOG" --wait
When the SYSTEMDB is available I can start the system, and perform the recovery for NWH database:
hdbsql -u SYSTEM -n bj-hana:30013 "RECOVER DATABASE FOR NWH UNTIL TIMESTAMP '2019-07-02 13:04:28' CLEAR LOG USING CATALOG PATH ('/usr/sap/NWH/HDB00/backup/log/') USING LOG PATH ('/backup/log/DB_NWH') USING DATA PATH ('/backup/data/DB_NWH/') CHECK ACCESS USING FILE"
And just to confirm that the recovery process went fine we can log in to the NWH database and select sample data.
POSTPROCESSING – SAP NETWEAVER
Replicating SAP NetWeaver is fully supported by Microsoft and it doesn’t require any additional steps. All disks are a copy of the on-premise volumes so we can simply start the instance:
The system has started without any issues and I was able to log in.
I think the Azure Site Recovery is a great tool to protect virtual machines. At the minimal cost, you can ensure your organization is prepared for unexpected events and you are able to recreate the system in less than an hour.
Very interesting post! Congrats Bartosz!
Very useful article. It will be great if you could consolidate all SAP on Azure articles into a tag, which will be easier for readers to track or any other way to easily look at all SAP on Azure articles.
Hi,
thanks for your comment.
There is a user tag "SAP on Azure" which include for all latest blogs, but probably I need to spend some time and update the tags for the rest of the blogs 🙂 I will do it soon!
Hi Bartosz,
The HANA log backup file name can not be uder-defined (e.g. to have a timestamp) and after some time the container will get a huge number of log backup files uploaded, then they are too big to download for the DR. Is there a better way to control the backup/upload by date?
Thanks,
Jun
Hi Bartosz,
Thank you for very valued document.
I’d like to have questions of RPO.
Thanks,
Trung,
Hi,
by default SAP HANA does the log backup every 15 minutes. So the RPO is at least 15 minutes, but you should also consider the time it takes to upload the file to the cloud - which will of course vary depending on your internet connection. That's assuming you're uploading the files as soon as they are created. You could write a script that uploads the data only twice a day - in such a case the RPO will be closer to 12 hours.
So yes, the RPO depends on the log backup frequency, but also the time required to upload data to the cloud storage.
Hi Bartosz ,
Great blog series ! I had a query on HANA backup transfer to cloud . Does the Azure Site Recovery not backup all the drives including data+log directories . So why do we need to create /hana/data and /hana/log again ?
By increasing the backup frequency of ASR we can have the latest full backup replicated . The log file can be replicated by real time log shipping . Will not provide lower RTO ? Of course HANA System Replication will be the fastest but just wondering about the above method .
Regards,
Shaswat
Hi!
Great questions!
The reason I decided not to replicate HANA Data and Log volumes is the consistency of the database. As the operating system is Linux, we can't really use any VSS technology to ensure there are no changes written to the database when ASR makes a snapshot of the volume. In theory you could freeze the log writer and then the file system for a short period of time, but in my view it's a bit risky and I personally haven't tested such a solution.
By default you should not use Azure Site Recovery to replicate database workloads. This blog provides a partial solution, when you don't need a low RPO / RTO.
Best regards
Bartosz
Hi Bartosz,
I have couple of question.Can you help ?
1.If I am using SOFS in primary region for /sapmnt and /usr/sap/trans , will i be able to replicate the same to DR by using ASR for ASCS,ERS and AAS VMs from primary region ?
2.For Content server/MaxDB , If i enable ASR for these VMs , will that be sufficient as a DR for the same.Or, should i replicate the data and log volumes using SIOS data keeper to DR separately ?
3.If I am not using ASR, and if I replicate the /sapmnt using AZcopy, will this be sufficient to provide DR for this storage ? For HA I am using SOFS for the same .
4.Can we use Windows Snapshot to provide DR for content server along with the data and log volume
Regards
Raji K