[HANA System Replication] end-to-end Client Reconn...

nicholas_chang · ‎09-30-2015

I've seen many posts on how to setup Hana System Replication and its takeover, however, none or few of the post that covers client reconnect after sr_takeover.

In order to ensure the client is able to find seamlessly the active HDB node (doesn't matter primary or secondary), we can either use IP Redirection or DNS Redirection. In this blog, i'll emphasize on simple IP redirection as it is much easier, faster and less dependancies compare to DNS redirection.

For details info on IP and DNS redirection, please refer to the guide:

http://scn.sap.com/docs/DOC-63221

Introduction to High Availability for SAP HANA

How to Perform System Replication for SAP HANA

First of all, we need to identify a virtual hostname/ip, create them in your DNS. Below is the sample virtual hostname/ip and physical hostname/ip used:

Virtual IP/Hostname: [10.X.X.50 / hanatest]

Primary Physical IP/Hostname: 10.X.X.20 / primary1

Secondary Physical IP/Hostname: 10.X.X.21 / secondary2

In normal operation, [10.X.X.50 / hanatest] is bind to Primary Physical Host - primary1

SAP instances, HTTP, BO, SAP DS and etc are connect to HDB via [10.X.X.50 / hanatest]

During any unplanned outage/ Disaster, [10.X.X.50 / hanatest] will be unbind from primary host and bind to Secondary Physical Host.

And below are the steps on mapping virtual IP [10.X.X.50] to its MAC address in Linux:

1) Bind virtual ip (10.X.X.50) to Primary Physical Host

primary1:/etc/init.d # ifconfig eth0:0 10.X.X.50 netmask 255.255.255.0 broadcast 10.X.X.255 up

2) Check eth0 entry:

primary1:~ # ifconfig

eth0 Link encap:Ethernet HWaddr XX:XX:XX:XX:XX:C2

inet addr:10.XX.XX.21 Bcast:10.XX.XX.255 Mask:255.255.255.0

eth0:0 Link encap:Ethernet HWaddr XX:XX:XX:XX:XX:C2

inet addr:10.XX.XX.50 Bcast:10.XX.XX.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

3) Ping hanatest and it is resolvable.

PING hanatest (10.XX.XX.50) 56(84) bytes of data.

64 bytes from hanatest (10.XX.XX.50): icmp_seq=1 ttl=64 time=0.028 ms

64 bytes from hanatest (10.XX.XX.50): icmp_seq=2 ttl=64 time=0.038 ms

64 bytes from hanatest (10.XX.XX.50): icmp_seq=3 ttl=64 time=0.024 ms

4) For all HDBs' clients, connect using virtual hostname [hanatest]

a) SAP - hdbuserstore:

sidadm 52> hdbuserstore list

DATA FILE : /home/sidadm/.hdb/XX/SSFS_HDB.DAT

KEY DEFAULT

ENV : hanatest:30515

USER: SAPSID

Login to SAP, and you'll see DBHOST is pointed to primary1

In DBACOCKPIT -> DB CONNECTION -> Ensure virtual host is used:

b) HANA Studio:

Sevices are running on Physical Host primary1

c) ODBC - connect using virtula host

d) HTTP - xsengine

http://hanatest:8005/

http://hanatest:8005/sap/hana/xs/admin

------------------Unplanned outage *DISASTER*:--------------------------------------------

During Disaster. we will:

i) Ensure primary HDB is down and not accessibile to avoid any split-brain

ii) Unbind virtual ip [10.X.X.50] currently binding to Primary Physical Host. in ifconfig, eth0:0 should not visible after you execute below command.

primary1:~ # ifconfig eth0:0 10.XX.XX.50 down

iii) clear ARP cache in client [optional]

iv) initiate -sr_takeover and wait HDB on secondary to be up and ready

v) Once HDB on secondary host is up and running, bind virtual ip [10.X.X.50] to Secondary Physical Host

secondary2:/etc/init.d # ifconfig eth0:0 10.XX.XX.50 netmask 255.255.255.0 broadcast 10.XX.XX.255 up

secondary2:~ # ifconfig

eth0 Link encap:Ethernet HWaddr XX:XX:XX:XX:XX:C8

inet addr:10.XX.XX.21 Bcast:10.XX.XX.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

eth0:0 Link encap:Ethernet HWaddr XX:XX:XX:XX:XX:C8

inet addr:10.XX.XX.50 Bcast:10.XX.XX.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 10)

vi) Ping hanatest and it is resolvable. virtual host [hanatest] is currently bind to Secondary Physical Host - secondary2

PING hanatest (10.XX.XX.50) 56(84) bytes of data.

64 bytes from hanatest (10.XX.XX.50): icmp_seq=1 ttl=64 time=0.028 ms

64 bytes from hanatest (10.XX.XX.50): icmp_seq=2 ttl=64 time=0.038 ms

64 bytes from hanatest (10.XX.XX.50): icmp_seq=3 ttl=64 time=0.024 ms

----------------------- End-to-End Client Reconnect Verification -----------------------------

once done, you can perform end-to-end client reconnect verification without the need to perform any changes.

a) SAP Instances after sr_takeover and running on secondary host:

a.i) Developer trace – SAP reconnect ok to secondary host:secondary2

B Connection 1 opened (DBSL handle 1)

B successfully reconnected to connection 1

B ***LOG BYY=> work process left reconnect status [dblink 2158]

M ThHdlReconnect: reconnect o.k.

M

M Tue Sep 29 13:50:13 2015

M ThSick: rdisp/system_needs_spool = false

C FDA DB protocol version from connection 0 = 1

B Tue Sep 29 13:55:16 2015

B Connect to XXX as system with hanatest:30515

C Try to connect as system/<pwd>@hanatest:30515 on connection 1 ...

C

C Tue Sep 29 13:55:17 2015

C Attach to HDB : 1.00.095.00.1429086950 (fa/newdb100_rel)

C fa/newdb100_rel : build_weekstone=0000.00.0

C fa/newdb100_rel : build_time=2015-04-15 10:44:35

C Database release is HDB 1.00.095.00.1429086950

C INFO : Database 'TST/05' instance is running on 'secondary2'

C INFO : Connect to DB as 'SYSTEM', connection_id=300064

a.ii) SAP status (HDB switched from primary1 -> secondary2)

b) HANA Studio

c) ODBC

d) HTTP

xsengine: http://hanatest:8005/

http://hanatest:8005/sap/hana/xs/admin

Hopefully this blog will serve as a reference for client reconnect strategy when setting up Hana system replication. Also, hopefully more consultant are aware of the three execellent guides above, which provided detailed info on client reconnect mechanism and hana system replication.

Cheers,

Nicholas Chang