SAP HANA Hands on tests ( part 4.2 ) : HANA replication failback
Where am i at ?
Previously I performed a takeover test , killing the primary system which was hdbtest1.
hdbtest2 node took over the HDB and i could restart my ECC application ( needing to restart it due to my lab configuration. see my previous doc / blog : SAP HANA Hands on tests ( part 4.1 ) : HANA replication takeover ) .
Now, at first I wanted to perform the failback only, but in the end I ‘ll also perform a “Near Zero DownTime” update of the HANA platform and then a failback.
Start situation:
HDB is running on Node2 as the primary instance.
Node1 failed a few days ago and is therefore not in sync anymore .
The HDB software is still there and installed on HDB node1.
I’m currently running the following version of HDB : 1.00.093.00.1424770727 (fa/newdb100_rel)
Target :
- HDB in version 1.00 SPS 97.
- Node1 back as primary
- Node2 back as stby.
How :
Basically, it should take these main steps :
- Get my hana node1 back in the configuration as a STBY node.
- Perform the software update on this stby HDB node ( hdbtest1 ) .
- Takeover -> Node1 is back as primary node
- Update HDB node2
- Put HDB node2 back in the configuration as STBY host.
- Perform the required post update steps.
Where to get some information :
First of all : RTNG ! ( Read The Notes and Guides ) .
There are lots of notes and guides around for this topic.
The main ones I followed :
- 2115815 – FAQ: SAP HANA Database Patches and Upgrades
- 1999880 – FAQ: SAP HANA System Replication
- 2170126 – SAP HANA SPS 09 Database Revision 97
- The SAP HANA administration guide chapter about NZDT.
- And any subsequent notes or docs.
I also read this excellent blog about Hana SPS updates :
Let’s go !
Get hana node 1 back in the game !
Having a look at the HDB replication statuses, I have this situation due to my previous simulated crash / takeover :
hdbtest2 :
hdbtest1 :
As you can see, the situation is not clear. I have 2 systems claiming here they are primary. So the First step for me is to clean this up .
At first i thought I would have to run into some clean up using some unregistering commands and putting the host back in.
It turns out that you only need to register the system again in the configuration.
So all I had to do was registering the system again using the hdbnsutil tool ( adding the option –force_full_replica ) :
hdbnsutil -sr_register –remoteHost=hdbtest2 –remoteInstance=HTL –mode=syncmem –name=HTLPRIM –force_full_replica
hdbtest1 node is now in syncmem mode instead of primary.
hdbtest2 is aware of the change in the topology : hdbtest1 is seen as secondary_host :
Now we restart the hdbtest1 node.
The system replication is triggered on startup :
The replication is back. hdbnode2 is replicating to hdbnode1 :
The replication is back online.
That said, for the failback to be complete, I will turn back to the initial situation while updating hana using Near Zero DownTime update concept :
Node 1 as primary
Node 2 as standby
Next steps will be described here :
SAP HANA Hands on tests ( part 4.3 ) : Near Zero DownTime update using replication
Hello Steve,
Is there any way to stop Replication to DR side for a while and open DR DB for test case and after that continue to Replication? We are doing it with Oracle Dataguard but couldn't find a way in HANA.
The only way seems to cancel Replication, do test and restart replication from the start.
Regards
Tutku
Hi Tutku,
Did you find out any additional information on your test case? We are in the same scenario, we have to test our DR systems periodically, so we DON’T want to fail back.
So far in my tests I have been doing that as well:
Let me know if you or anyone else came up with a better way, all the tests all assume a failback situation, we definitely do NOT want that or any risk of failing data from DR back to the Primary.
We also don't want to risk Primary site becoming unresponsive. We had an issue where bandwidth became saturated and also a global.ini mismatch, which caused production to slow to a crawl due to trying to send changes to the Secondary site.
Still debating whether to use built in replication for this task or stick with some homegrown rsync and log shipping solution instead.