Big thanks to Fabian Herschel and Peter Schinagl from SUSE for proof-reading the blog.
First part of this blog is located here:
Be Prepared for Using Pacemaker Cluster for SAP HANA – Part 2: Failure of Both Nodes
What happens when both cluster nodes will fail
Let’s start with Pacemaker Cluster running during normal operation. Colors are having following meaning – component is available (green), standby (yellow) or unavailable (red).
Figure 2 – Pacemaker Cluster during normal operation
SAP HANA on server hana43 is primary – this is visible by status PROMOTED, roles string containing “P”, sync state set to PRIM and LPT timestamp set.
SAP HANA on server hana44 is secondary – this is visible by status DEMOTED, roles string containing “S”, sync state set to SOK (meaning replication is healthy) and low “static” LPT value.
Now let’s assume that primary server will crash and is unavailable.
Figure 3 – Pacemaker Cluster after primary server failed
Secondary server will notice that primary server is down and since replication status is SOK it is able to initiate failover. Fencing operation will be executed to ensure hana43 server is offline and failover will be executed.
Figure 4 – Pacemaker Cluster after secondary server failover
When SAP HANA on server hana44 is promoted to new primary all cluster node attributes are updated –status is set to PROMOTED, roles string changed to contain “P” as database failover was executed, sync state was set to PRIM and LPT timestamp was set.
Note that cluster node attributes on primary server remain unchanged – this is because that server is unavailable. These attributes are stored locally on that server.
Now let’s assume that secondary server crashed while primary is still offline.
Figure 5 – Pacemaker Cluster after secondary server failed
Now both servers are offline but SAP HANA databases on both servers are configured to run as primary – this is also confirmed by cluster node attributes set that way.
How to (not) destroy your company data
Now imagine following scenario. You are on-duty System Administrator and you got called in the middle of night that SAP HANA is down and you need fix it. Still half-asleep you are logging to the servers listening to your boss explaining you over the phone how much money your company is losing every minute SAP HANA is down and how important it is to get SAP HANA up and running as fast as possible.
When you are finally there you see that both SAP HANA cluster servers are offline. To make it more obvious – this is what you see.
Figure 6 – Pacemaker Cluster state as System Administrator can see it
Since you are under pressure to fix SAP HANA ASAP you decide to fix primary first – after all secondary can be fixed later once we are back in business.
DANGER!!! Now you should stop and get fully awake. How do we know which server is “last primary”? Unless you got that information from some external monitoring system you have no way of knowing!!!
Let’s look what would happen if you would start wrong server. Let’s assume System Administrator checks the documentation and sees there that hana43 is supposed to be primary and decides to start it up first.
Figure 7 – Pacemaker Cluster after hana43 server was restarted
Once Operating System is rebooted System Administrator starts the Pacemaker Cluster. Since secondary server is still offline Pacemaker Cluster will be unable to retrieve LPT value from cluster node attributes of secondary server. Without LPT values from both server nodes Pacemaker Cluster will not start SAP HANA database and cluster node attribute status will be set to WAITING.
This protection from Pacemaker Cluster is called “restart inhibit”. It is there to ensure that SAP HANA is started only in case that Pacemaker Cluster can clearly determine which server is “last primary”.
DANGER!!! At this point System Administrator should stop and start thinking.
Let’s assume that our System Administrator is still half-sleep and he will be surprised why SAP HANA is still down and will start it manually.
Figure 8 – Pacemaker Cluster after SAP HANA on server hana43 was manually started
Once SAP HANA is started manually Pacemaker Cluster will detect it and will adjust cluster node attribute status to PROMOTED.
From the moment when SAP HANA was started all database updates are stored in SAP HANA database running on server hana43. However SAP HANA database running on server hana43 is not having all the data that was persisted after failover to database on server hana44 – see Figure 4.
By manually starting SAP HANA database our System Administrator caused logical inconsistency that will take incredible effort to fix.
Let’s see what is the correct approach how to deal with situations when both servers are offline.
If you are unable to clearly determine which server was “last primary” then you need to start both cluster nodes. Without access to both servers Pacemaker Cluster is unable to correctly determine which server was running “last primary” SAP HANA database.
Figure 9 – Pacemaker Cluster after both servers are restarted
Once attributes from both cluster nodes are available the Pacemaker Cluster will check which node is having higher LPT value to decide which database was “last primary”. Unfortunately this information is not written to SBD drive or anywhere else outside the local node so both nodes must be available for correct determination of “last primary” SAP HANA database.
Alternative approach can be used only in case that you are 100% sure which SAP HANA was “last primary” – maybe getting this information from some external monitoring system. In such case you can do exactly as described in previous section however you perform described steps on correct server.
First you restart “last primary” server (hana44), start Pacemaker and then manually start SAP HANA that will become primary database. Make sure to execute these steps on correct server.
Later you can restart the other server (hana43), start Pacemaker, register local SAP HANA as new secondary and cleanup resource to start SAP HANA as secondary database.
Please note that there is no protection that will prevent you from manually starting wrong SAP HANA database causing data loss or logical inconsistency. It is responsibility of System Administrator to start both nodes at the same time or correctly determine which database was “last primary”.