SAP cross-cluster move with minimized downtime
Many customers are using SAP ABAP or Java application servers installed in a system architecture based on the Windows Server Failover cluster framework. This high availability architecture is well-known and has been proven to work for years. Unplanned hardware failures and maintenance activities can be covered with minimal impact for the system availability.
But in some scenarios, it is necessary to transfer an SAP system from one Failover Cluster to another one. That could be the case in these situations:
- New active directory domain name or new DNS domain name, e.g. after the acquisition of a company
- A Windows OS upgrade, e.g. from Windows Server 2012 or lower to Windows Server 2016.
- From Windows Server 2012 R2 there is the possibility to upgrade the nodes rolling and in-place
- A (partly) destroyed cluster configuration
- Redesign of the datacenter architecture
- Move to an IaaS partner like Microsoft Azure or to the Google Cloud Platform
As far as I know, there are many guides and best practices (like the official SAP installation guide or this blog) written on how to install SAP in a Windows Server Failover Cluster. But I don’t know a “best practice documentation” on how to move an SAP system from one cluster environment to a new one. I will try to change this with this blog.
As a starting position, I will assume that we do have an already clustered SAP (A)SCS instance up and running. Of course, this method would work in a similar way for every clustered SAP ABAP or Java system. In this example, I am using an SAP ABAP system called “RIX”. In the beginning, RIX is running on a Windows Server 2012 R2 cluster, consisting of the nodes wsiv8050-1 and wsiv8050-2. It’s a very typical setup, as it is described in the SAP ABAP Installation Guide (find an overview of SAP NetWeaver Installation Guides here[Link]) and it looks like:
From an SAPMMC perspective, the system is structured as shown in the following screenshot:
While the existing RIX system is running without any interruption on the current environment, start to build the new landscape. In a first step, we must configure the new Windows Failover Cluster. That means:
- Deploy at least two new Windows Operating Systems (in this example, the hostnames are wsiv8051-1 and wsiv8051-2.
- Configure the network between the nodes, reserve IP addresses and so on.
- Provide a new shared disk that can serve as a clustered disk for the SAP ASCS (and maybe an additional disk for the clustered database instance).
- Create the WSFC following the Microsoft guidelines and the SAP HA installation guide.
- If necessary, install the database software on the new cluster. For this blog, I installed a clustered SQL Server 2016 instance on wsiv8051-1 and wsiv8051-2. Another possible option would be to use SQL Server AlwaysOn instead of the traditional SQL Server Clustering. Or – depending on the planned landscape – install the SAP database instance outside of the SAP ASCS cluster, e.g. on a dedicated SQL Server Database cluster.
Starting the SAP installation
After the new cluster has been prepared for the SAP installation, start SWPM on the first new cluster node to install a new “First Cluster Node”. Use the very same SID for the new system (RIX in this case) and the identical instance number for ASCS and ERS as you have used in the old clustered environment. For the SAP Network name, it is obviously impossible to choose the same name as it is in use in the existing system to avoid a naming collision. In my example, I will use “obelix” instead of “asterix” as a temporary hostname for the SAP Virtual Instance.
If the clustered ASCS has been installed successfully on the first server, continue to install the Additional Cluster Node on the second cluster server member, following the SAP installation guide using SWPM. The following screenshot shows the result in the new environment. SAP ASCS is installed with the (temporary) hostname “obelix”, the ERS instances are locally deployed on the new Windows Server nodes wsiv8051-1 and wsiv8051-2, and the SQL Server database software is already up and running. Now is the perfect point in time to test the failover capabilities. In addition to that, you can now install all the required Windows patches, upgrade the SAP kernel and migrate everything from the former cluster nodes to the new one, e.g. environment variables, additional 3rd party applications and so on:
It’s also recommended, to now sync all filesystem content that is in the current \\asterix\sapmnt to the new \\obelix\sapmnt share. This could include, for example, certificates, SAP transport system related content, and logs. Additionally, transfer content from the profile folder and merge them into \\obelix\sapmnt\RIX\SYS\profile.
Prepare the database move
From an SAP cluster perspective, everything is now prepared to start the move. But in my environment, it is advisable to start transferring the database to the new SQL Server instance now, because I plan to move both components, SAP and database in one short downtime from the existing environment to the new cluster. To keep the necessary downtime as short as possible, I will utilize the standard backup- and restore capabilities that are offered from the SQL Server DMBS to build up a “hot stand-by database”. To do so, there are many conceivable approaches. Read the existing database software documentation for your DBMS to find the best way for your setup. For SQL Server, a manually initiated “transaction log shipping”, as described here, is a sufficient feature to achieve the desired goal.
To do so, I combined this TSQL commands and executed them in SQLCMD mode in the SQL Server Management Studio as Query
--execute the Full Database Backup on the existing instance :connect WSIV8050-SQL\CLU GO BACKUP DATABASE [RIX] TO DISK = N'\\park\platz\Samuel_Backup\RIX\FullRIX.bak' WITH NOFORMAT, NOINIT, NAME = N'RIX-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 5 GO --now we can connect to the target instance to restore the database fullbackup in the new Environment :connect WSIV8051-SQL\C51 USE master GO RESTORE DATABASE RIX FROM DISK = N'\\park\platz\Samuel_Backup\RIX\FullRIX.bak' WITH FILE = 1, NOUNLOAD, REPLACE, STATS = 10, NORECOVERY GO --do the TLOG backup on the source database :connect WSIV8050-SQL\CLU GO BACKUP LOG [RIX] TO DISK = N'\\park\platz\Samuel_Backup\RIX\01-RIX.trn' WITH NOFORMAT, NOINIT, NAME = N'RIX Database Log Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10 GO --and restore the TLOG on the target instance :connect WSIV8051-SQL\C51 GO RESTORE LOG RIX FROM DISK = N'\\park\platz\Samuel_Backup\RIX\01-RIX.trn' WITH FILE = 1, NOUNLOAD, STATS = 10, NORECOVERY GO
Pay attention to the “NORECOVERY“-option I used twice. The result on the target instance is a database in ‘Restoring’ mode. That allows to restore additional transaction log backups to bring the source and the target database in sync during downtime, before opening the RIX database on the target instance on wsiv8051-SQL\C51.
Do the magic: Move the ASCS and DB instance in a short system downtime
As always, it is mandatory to have a full system backup before doing maintenance on a productive SAP system. If everything is well-prepared, the risk to need to roll back is quite low, but it’s always a possible solution to roll back and postpone the maintenance activity.
At this point, you have a clustered ASCS instance, a clustered SQL Server database instance, and local ERS instances running on both cluster environments. Your ABAP application instances are still installed on the old/existing cluster.
If your downtime window begins, stop – as always – all interfaces, your batch jobs and log off all dialog users.
- Stop the SAP application servers.
- Take the ASCS & ERS instances offline. It is recommended to “disable” the Windows services to avoid an unintended restart in future.
- Using the Windows Server Failover Cluster Management, remove (!) the IP address and the network name “asterix” from your current cluster node wsiv8050-1.
- Remove the cluster object in Active Directory “asterix”
- Modify the IP address and network name from “obelix” to “asterix” in the SAP cluster role in the new environment wsiv8051-c.
- Before we can start the new ASCS, we need to reconfigure the profile files if not already prepared at this point in time. That means:
- Rename the ASCS profile, e.g.
rename D:\usr\sap\RIX\SYS\profile\RIX_ASCS00_obelix D:\usr\sap\RIX\SYS\profile\RIX_ASCS00_asterix
- Adjust all the parameters inside the DEFAULT.PFL and the instance profile RIX_ASCS00_asterix that are pointing to obelix, which didn’t exist anymore. To do so, you could simply copy the profiles from the former cluster. But take care of every single profile parameter and – most importantly – adjust the database related settings. For SQL Server, this would be the parameters
SAPDBHOST = WSIV8051-SQL dbms/type = mss dbs/mss/server = WSIV8051-SQL\C51 dbs/mss/dbname = RIX dbs/mss/schema = rix
- Modify the local profiles from the ERS instances. In my environment, they are local profile files in C:\usr\sap\RIX\ERS10\profile\ on each node. Correct all the “obelix” entries to “asterix” in these profile files on every cluster node. And remember to restart the ERS instances after the changes have been made!
- If there are <sid>adm specific user environment variables like SAPLOCALHOST=obelix or MSSQL_SERVER=wsiv8050-sql\CLU, you must modify them to the new values. E.g. using regedit HKEY_USERS –> <GUID> –> Environment.
- Modify the Windows service SAPRIX_00 on both (=all!) new cluster nodes, that asterix instead of obelix is used to determine the sapstartsrv.exe parameters. You can either use sc.exe /config= or regedit.exe adjusting the “ImagePath” key in the hive HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\SAPRIX_00
- Rename the ASCS profile, e.g.
- Adjust the name of the Windows Server Failover Cluster Role. With Windows Server 2016, after renaming the main access point from obelix to asterix, you must rename the cluster role name as shown in this screenshot:
That’s necessary because the Windows Server Failover Framework is in the assumption, that the main access name (recently changed to “asterix”) should be reflected in the role name. for an SAP (A)SCS cluster, the role name is configured during installation to “SAP <SID>”. This value will be string-compared by SAP software logistic tools like SWPM and SUM. So please modify the Role-name from “asterix” back to “SAP RIX”
- Finally, do the database move. That means transferring all the in-database-modification from the old SQL Server instance to the new one. To do so, one can simply do a last transaction log backup, taking the source database offline (just in case – the SAP system is already stopped). To finish, restore the transaction log backup to the target instance and recover (=open) the target database. A fitting SQLCMD query would look like:
--do the last TLOG backup on the source database before switching to offline :connect WSIV8050-SQL\CLU GO BACKUP LOG [RIX] TO DISK = N'\\park\platz\Samuel_Backup\RIX\02-RIX.trn' WITH NOFORMAT, NOINIT, NAME = N'RIX Database Log Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10 ALTER DATABASE [RIX] SET ONLINE WITH ROLLBACK IMMEDIATE GO --and restore the TLOG on the target instance :connect WSIV8051-SQL\C51 GO RESTORE LOG RIX FROM DISK = N'\\park\platz\Samuel_Backup\RIX\02-RIX.trn' WITH FILE = 1, NOUNLOAD, STATS = 10, NORECOVERY GO :connect WSIV8051-SQL\C51 GO --finally open the database on the target SQL Server instance RESTORE DATABASE RIX WITH RECOVERY
- Re-configure the SQL Server logins and security. To do so, there are many possible ways. One would be to follow SAP note 1294762 – SCHEMA4SAP.VBS and create a schema repair script. That would transfer all the logins. In addition to that, remember to transfer all other database related objects like SQL Server Agent jobs, additional logins or maintenance plans.
- Start the ASCS instance using the cluster role “SAP RIX” on the new cluster node. If everything is properly configured, the ASCS is running smoothly.
- Start the old application server instances which you still have installed at least on two Windows hosts, wsiv8050-1 and wsiv8050-2. All former application instances should successfully load their profile from the “new” \\asterix\sapmnt\RIX\SYS\profile without any modification. Verify that the application server instances connect to the database moved to the new server.
At this point you have successfully switched to the new ASCS instance and database on the new cluster!
Follow up activities
You can now:
- install additional application server instances on the new cluster nodes
- remove the old SAP instances on the old Windows hosts. Or simply delete the old environment.
After everything is finished, you can install two additional SAP application server instances on the two new Windows Server 2016 cluster nodes to the existing system RIX. Of course, these SAP instances are installed on local disks. Thus, your final system architecture could look like:
The difference to the starting architectural overview is marginal – only both node host names are exchanged (from wsiv8050-1/ wsiv8050-2 to wsiv8051-1/ wsiv8051-2). And of course, the new cluster nodes are completely independent from the former ones – they could be using a different Windows Server Version or changed server location (e.g. even operated in an IaaS environment).
Please consider this blog as a source of inspiration. It could be very beneficial to follow such a guidance instead of obtaining knowledge by doing. But I strongly recommend to test this procedure in your environment before each production downtime. After you finished the “obelix” installation, you could perhaps move this environment to an unused dummy name, for example “idefix”. That will lead to the certainness which is necessary to do all the of steps systematically during the (short?!) downtime of your productive system.