Setting up SUSE High Availability Cluster Quick Start demo
SUSE kindly provide advice on how to install an HA (High Availability) Cluster, the online documentation is here:
I thought “this will be easy”… it wasn’t that easy… this blog explains how to get it working. The host laptop was a MacBook and the Hypervisor was VMware Fusion – although many people will have other combinations of host and hypervisor, hopefully some of the advice below is still somewhat useful.
Some of the steps are based on the demo setup lecture (week 1, unit 4) by Richard Mayne in the openSAP course about SUSE SLE15 and High Availability:
It’s a good course, recommend looking at it.
I don’t operate any Helpdesk, so if you are stuck, try googling the answer and use your debugging superpowers etc. On the other hand, if there are mistakes or things that could be done better, feel free to point those out.
Install the VMs.
Let’s get a basic SLES 15 SP2 installation DVD:
…and (if you like) a new evaluation code from the SUSE Customer Centre:
Save the code somewhere, you can use it later to register your new SUSE VMs. (You can also do the demo without registering any of your VMs).
Downloaded the “Full DVD” without the source code, so that was: SLE-15-SP2-Full-x86_64-GM-Media1.iso
Then check the uhh, SHA-256 checksum (on macOS which I’m using as the host, the command is ‘shasum -a 256 <path-to-iso-file>’; on linux you could use ‘sha256sum <path-to-iso-file>’; on Windows, maybe ‘certutil’ is your friend, but I have not tested that)… so yes, we check that no bad actors have tampered with the file on its way to your computer. Ok that value matches the website’s value, check passed.
We create 4 (say) new networks on VMware Fusion, then record their subnets onto a spreadsheet. [You can set up the demo using just the one default virtual network if you like – in that case skim past various parts of this blog; the SUSE documentation recommends at least two networks though, and in most Hypervisors networks are easy to create]. Go to menu VMware Fusion → Preferences → Network, and create a few more “custom” networks. You can either specify a subnet IP yourself, if you know what you are doing (basically check that you avoid clashes with any existing network IP ranges), or you can let the hypervisor auto-generate the subnet IP.
How many VMs do we need? Well it looks like 3: two cluster node VMs, and one SBD device VM. Each OS can probably get by on a 12GB disk.
The first node will be “helsinki-01”. In Fusion create a new VM, add our SLES .iso file as the virtual installation DVD. I chose UEFI boot instead of BIOS, it being the year 2021. Choose “Customise Settings” to Save and then the Settings GUI opens.
From there we assign Network adapters for all 6 networks (preconfigured and custom) – so use “Add Device” and add the 5 additional Networks (in addition to preassigned default NAT network, you add “Private to my Mac”, “vmnet3”, “vmnet5”, etc). Set disk size to 12GB, should have Bus Type “SCSI” as default, if not change it to “SCSI”.
Since my host has 8 CPU processor cores available, I am going to assign 2 cores each to the VMs; and since the host has 16GB of RAM, I will give the VMs 3072MB RAM each. That way there are enough processors and memory left over for the host to operate. Then start the VM…
Choose “Installation”. Since a network connection should be detected (at least through default NAT) then installer will update itself and restart itself. Choose a language and a keyboard layout. Choose “SUSE Linux Enterprise Server 15 SP2” (or whatever major and SP version you are installing). Agree to the License Terms.
If you obtained a free 60 day evaluation (registration) code for the basic SLES 15 OS (not the HA extension), you can use that now to register your new SLES VM. You need the email address of your SUSE Customer Centre account, and the registration code (evaluation code) that you got from said Customer Centre. If you registered now, you will have the option to get the update repos added to your new machine, say Yes to that.
Next you choose which Extensions and Modules you want to include. So in the demo set up, the Software Requirements are defined as:
“All nodes that will be part of the cluster need at least the following modules and extensions:
Base System Module 15 SP2
Server Applications Module 15 SP2
SUSE Linux Enterprise High Availability Extension 15 SP2”
So, we take those 3 (the first 2 should be already ticked for you, then you add the HA Extension by ticking that checkbox).
If you have registered your new VM, the installer contacts SUSE and tries to register those modules and extensions. Probably it will ask for a separate registration code for the HA Extension. You can get that by using the link behind “login to SUSE Customer Centre” from this page:
Once you have that code, again at this stage you can type it in manually, yet accurately (just think, back in the days this kind of “manual input” process was many people’s actual job that they had to do).
For the System Role, choose the (probably suggested) “HA node”, as that is what this VM will become.
I am accepting the suggested partitioning, the main partition on /dev/sda2 is BTRFS:
Hardware clock should be set to UTC. Set your location (time-zone), and I accepted for now the default pool servers of suse.pool.ntp.org. (We change some of this NTP stuff later).
For now skip local user creation; define a password for the root user when prompted.
Review all your choices from the Review screen (make sure SSH server enabled and SSH port open), then proceed to the actual Install phase.
[Before the actual Install part, we could have gone into the Network Configuration and e.g. defined bonded interfaces; also here, you can set the hostname to e.g. “helsinki-01” or whatever your VM is to be known as. (Later we show how to set the hostname after installation using hostnamectl; in addition, you can both set hostname and define bonded interfaces, using the SUSE Yast tool – leaving that as an exercise for the reader)].
Once installation is all done, you reboot then login as root. It will be more practical to use ssh than the virtual console, anyway here is the output of
# ip a
The main disadvantage of the virtual console is there is no copy-pasting available. So assuming you have kept sshd enabled and allowed ssh login, then you can ssh from your host computer. If those colours don’t blind you, you can see that our new VM’s eth0 (vmnet8) address is 192.168.108.3.
So I login from MacBook via Terminal as follows (“%” is the zsh prompt):
% ssh firstname.lastname@example.org
(Ignore any warning about being unable to set the locale, this is a macOS Terminal minor bug/feature where it tries to inform the guest it is logging into about its preferred locale settings. See settings in ~/.bash_profile for example). Next thing (if not done already in the installation workflow) is to change the hostname from the default “localhost” to something more pet-like, less cattle-like:
# hostnamectl set-hostname helsinki-01
To see the name has really changed:
If you want the command prompt to pick up the change, then logout of your session:
The dark-red on blue text maybe isn’t the best for readability, but anyway the hostname has changed to helsinki-01. Next thing is to run from Terminal login over ssh:
# ip a
Now this is where ssh over Terminal comes in handy – you just copy the 6 IPv4 addresses for their respective interfaces and paste them to your spreadsheet for the helsinki-01 item… so you would get something like this:
So helsinki-01 has eth0 node on “.3”, and the rest are on “.2”.
[For this demo I have just been letting DHCP assign all the IPv4 addresses, whereas in the openSAP course static IPs were defined. In practice, once assigned by Fusion, the vNICs all get to keep those addresses also after reboots, at least that is my experience. But static IPs would remove the possibility of DHCP reassigning new IPv4 addresses to your VMs and their NICs; overall, static IP assignment is “neater” and more realistic for servers, though more time-consuming].
If you want to make bonded interfaces (which are a high-availability feature: 2 ordinary interfaces bonded together, so that if one fails (highly unlikely in this demo) the other one keeps working and the connectivity is saved), then use Yast:
System → Network Settings → Overview → Add… choose the Bonding type of adapter, and choose which existing adapters are to be bonded together. Example results:
We can see that now eth1 and eth2 are not independently assigned an IP address any longer, as they now constitute the bond0 interface’s IP:
Anyway, back in our demo setup, we install helsinki-02. Same steps as for helsinki-01, but different hostname. And then once again we record the IPv4 addresses that it received:
So helsinki-02 has eth0 node on “.4” and the rest on “.3”.
The third and last VM of this Quick Start demo, is the “STONITH by disk” or “SBD” VM. So we will give it two disks, one for the OS as usual, 12GB, then via “Add Device..” we add a new disk for SBD purposes. SBD disk can be small, it seems in Fusion 1GB is the smallest disk size available so we go with that. Then we can choose for example vmnet1 and vmnet3 as the Network Adapters, so later vmnet3 can be the “san-subnet”. (Make sure that at least one Network Adapter has external NAT enabled, to download things and to be able to use an Internet NTP server).
Now this VM is not itself going to be a cluster node, but for System Role we can choose HA Node anyway, I suppose that just ensures that ssh config will be conveniently done for us and the HA scripts or other relevant utilities might be provided in case we need those.
System Partitioning… initial proposal is only to partition the main disk (sda, 12GB), and not do anything with the SBD disk (sdb, 1GB). We go with that suggestion.
Once installed, again check what IPv4 addresses it has, and add those into the spreadsheet. For reference, a snapshot of the spreadsheet state from later on i.e. once this demo was all set up:
Next task, if we have all three VMs running, is to test connectivity between them, using ping.
Synchronise the time, roll-out hosts file, iSCSI, softdog.
Note that in SLE15, we now use the chrony daemon:
“Since SUSE Linux Enterprise Server 15, chrony is the default implementation of NTP. chrony includes two parts; chronyd is a daemon that can be started at boot time and chronyc is a command line interface program to monitor the performance of chronyd, and to change various operating parameters at runtime.”
One good thing here is that unlike in the old days, there is no need to add “tinker panic 0” into ntpd.conf anymore.
Let’s now adjust NTP config so that the helsinki VMs get their time from the storage VM.
# systemctl status chronyd.service
Should be active (running) and enabled (re-activates after reboot). If not, the relevant commands would be:
# systemctl enable chronyd.service # systemctl start chronyd.service # systemctl status chronyd
(the “.service” is optional though useful syntax). Use the ‘q’ character to quit from viewing status.
Check the current setup of /etc/chrony.conf – should list the suse.pool.ntp.org servers
# cat /etc/chrony.conf
pool 0.suse.pool.ntp.org iburst
pool 1.suse.pool.ntp.org iburst
pool 2.suse.pool.ntp.org iburst
pool 3.suse.pool.ntp.org iburst
Check the sources and the current status:
# chronyc sources
Probably the results show a load of different time servers that have joined the suse pools.
# chronyc tracking
So Leap status is “normal” and that is what we like to see.
Ok, let’s try changing chrony.conf so that the 2 helsinki VMs get their time from the storage VM.
For some reason SLE15 only has that arcane text editor from the Precambrian called vi… but we of course prefer to use nano because, well, it’s more intuitive… hmm, how to install nano… since I have registered these VMs, can use this command to get the Package Hub repo(s):
# SUSEConnect --product PackageHub/15.2/x86_64
[Or, if you are not using a registered system, you can add this repo instead:
# zypper addrepo https://download.opensuse.org/repositories/editors/SLE_15/editors.repo
# zypper refresh # zypper install nano # nano /etc/chrony.conf
Comment out the lines mentioning Internet ntp servers (the ‘#’ character at start of lines in the file means they are commented-out):
#pool 0.suse.pool.ntp.org iburst
#pool 1.suse.pool.ntp.org iburst
#pool 2.suse.pool.ntp.org iburst
#pool 3.suse.pool.ntp.org iburst
#! pool pool.ntp.org iburst
At the end of the file add these 3 lines, 172.16.48.4 being the IP of the storage VM:
local stratum 10
In nano, ctrl+O, <Enter> to Save. ctrl+X to exit nano.
Then there is a file /etc/chrony.d/pool.conf auto-generated probably during install, which has a line referring to a suse pool server, so comment out that line too using nano (or vi, if you like).
Then restart the chronyd.service and see if it looks ok:
# systemctl restart chronyd # systemctl status chronyd # chronyc sources
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
^? 172.16.48.4 0 6 0 – +0ns[ +0ns] +/- 0ns
Also check that tracking is ok:
# chronyc tracking
Once that is done for both helsinki nodes, we should configure and roll-out an /etc/hosts file.
We add these lines to storage VM’s /etc/hosts, as you see I use the 172.16.48.0/24 network for the main intra-VM communication, and then specify 172.16.232.4 as the IP address to reach iSCSI targets:
# nano /etc/hosts
# our cluster host info:
172.16.48.2 helsinki-01.example.com helsinki-01
172.16.48.3 helsinki-02.example.com helsinki-02
172.16.48.4 storage.example.com storage
172.16.232.4 storage-san.example.com storage-san
# cd /etc # scp hosts helsinki-01:/etc/ # scp hosts helsinki-02:/etc/
The storage VM, we want to set it up as an iSCSI server. Here’s the documentation:
So we need the server packages:
# zypper in yast2-iscsi-lio-server
Then follow the online instructions for using Yast. So I set up one target device…
Network Services → iSCSI LIO Target. After writing configuration: restart. After reboot: start on boot; Open Port in Firewall: [x] (i.e. yes);
Targets → Add → Target (defaulted value):iqn.2021-03.com.example; Identifier(defaulted value): f3b066528a528a059781; IP: 172.16.232.4 (i.e. the IPv4 address of this storage host’s vmnet3|eth2 adapter); port(defaulted value): 3260; Bind all IP addresses (defaulted value): [x]; Add (LUN) → LUN: -1 (auto-generate); LUN Path → Browse → /dev/sdb → OK → OK → Finish.
Ok, now this is still a “raw disk”, no filesystem or partitions, and we keep it that way.
So then we want to connect to it e.g. from helsinki-01 using iSCSI Initiator.
Network Services → iSCSI Initiator. After writing configuration: restart. After reboot: start on boot; Connected Targets → Add. IP address: 172.16.232.4 (should match the target, obviously); port: 3260 (again should match); No Discovery Authentication: [x]; Startup: automatic. [For some reason “Startup: automatic” is the option that works after reboots].
If you get the disk connected via Yast, then back in command line it should be visible from:
# iscsiadm -m node
And now if we run:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 12G 0 disk
|-sda1 8:1 0 500M 0 part /boot/efi
`-sda2 8:2 0 11.5G 0 part /
sdb 8:16 0 1G 0 disk
sr0 11:0 1 9.9G 0 rom
…we see here that our 1G shared disk is really shared.
So the same iSCSI Initiator stuff for helsinki-02 as well, so that both those helsinki nodes are sharing the disk on the storage VM.
Test that this config survives rebooting: shutdown the helsinki nodes. Shutdown the storage node (only to be shutdown after the helsinki nodes are shutdown). Boot up storage node (always boot up the storage node before the helsinki nodes, as it has the time server and the iSCSI target). Boot up helsinki nodes. Run ‘lsblk’ on both helsinki nodes. The shared disk should be visible to both nodes.
For whatever reason (perhaps initially wrong values when in yast configuring iSCSI Initiator)), the first time I rebooted helsinki-01, there was an issue that the iSCSI target was not listed in lsblk, i.e. connection somehow failed. So a hackaround fix for this issue was switching “node.startup” from “manual” to “automatic” in the settings of /etc/iscsi/iscsid.conf
# To request that the iscsi initd scripts startup a session set to “automatic”.
### node.startup = automatic
node.startup = automatic
# To manually startup the session set to “manual”. The default is manual.
###node.startup = manual
Then after reboot, the discovery worked fine and disk was visible via ‘lsblk’.
Note that although the quick demo advice recommends referring to disks “by ID”, however running the command
# ls /dev/disk/by-id/
… gives 4 different results for the one shared disk. Whereas if we run the command:
# hwinfo --scsi --short
… we see that the shared disk is assigned to /dev/sdb…
So, initially I just decided to use /dev/sdb as device name and assume that this is always going to work out nicely. (If you use a “by id” value instead of the “by name” one, then for example “scsi-360014055c008e13975049ec9b3289d27”-id worked fine for me later on when reconfiguring).
Next task is to enable softdog. Now one question that comes to mind, is which VM or VMs should softdog be running on? My guess is that it should be running just on the 2 helsinki VMs, so let’s go with that plan and see how well or badly it turns out.
From the SUSE documentation:
“Enable the softdog watchdog:”
# echo softdog > /etc/modules-load.d/watchdog.conf # systemctl restart systemd-modules-load
“Test if the softdog module is loaded correctly:”
# lsmod | grep dog
softdog 16384 0
The important thing is to get a result-line starting with “softdog”. The first number is the size of the module in bytes, and the second number tells how many instances of softdog are currently in use… just after being enabled, that number might be zero, later it may be a natural number such as 1 or 2.
So do that for both helsinki VMs.
Ok so now we are about ready to run the ha-scripts. Maybe just before that, make sure the helsinki VMs are patched to more or less the same level:
# zypper lu # zypper --non-interactive up
Run the HA scripts and do smoketesting.
[Note: if you are using a public cloud provider such as Azure, Google Cloud, or AWS, you will likely need to use Unicast for corosync instead of Multicast – see the relevant openSAP course, week 2 unit 1 lecture for details how to use Unicast. In our blog though, we are following the SUSE documentation i.e. we are using Multicast].
On helsinki-01, we create the new cluster, let’s call it “helsinkicluster”:
# ha-cluster-init --name helsinkicluster
The script suggests IP address 192.168.84.2 to bind to (on vmnet5|eth3)… that will be for the corosync communication… sounds ok to me.
Multicast address is suggested to be 220.127.116.11, which sounds ok:
“The 18.104.22.168/8 range is assigned by RFC 2365 for private use within an organization. Per the RFC, packets destined to administratively scoped IPv4 multicast addresses do not cross administratively defined organizational boundaries, and administratively scoped IPv4 multicast addresses are locally assigned and do not have to be globally unique.”
Multicast port is suggested to be 5405, sounds fine.
Next you are given the chance to set up SBD (STONITH By Disk):
We say “y” for Yes. Then we enter the /dev/sdb device which thanks to the magic of iSCSI is our shared storage disk. (Later on, I decided to reconfigure this config-point to use disk by-id instead, as shown in Appendix below).
Then we accept that all data on /dev/sdb will be destroyed, so “y”. Then SBD gets set up:
We want to configure a virtual IP address, because since the idea is that individual nodes can get fenced, it is better not to have to “try out” the node-specific URLs (such as https://192.168.84.2:7630/ a.k.a. “on helsinki-01”) when we want to view the Hawk dashboard.
Not quite sure what the best practices for choosing a virtual IP are, so we will pick an unused IP from vmnet8: 192.168.108.100 it is then. Ok, end of script.
Now we want to go and view our Hawk dashboard, which should be at:
In March 2021, on MacBook, Chrome and Opera browsers will refuse to let you take any risk by opening this website, as the SSL certificate offered is not trusted. Firefox and Safari will warn you that they don’t trust this website, but will let you choose to go to the website if you want:
So we indeed want to visit this website (with private IPs there isn’t much risk, unless there are suspicious people who can access your Fusion software’s 192.168.108.0/24 subnet). Click past warnings (Safari – add exception using your admin password), to get to the login page for Hawk. Login as user “hacluster”, default password was “linux”.
Looks fine. Next we want to join the helsinki-02 VM to the helsinkicluster, using the relevant script:
We are asked for the IP address or hostname of an already existing cluster node. Since above we used 192.168.84.2 from vmnet5|eth3, so we specify that IP here as it seems most consistent way to do this. Then we follow the instructions to set up passwordless ssh between the 2 nodes, and the script does its job. Refresh the Hawk webpage and in the Nodes tab we can see both nodes in helsinkicluster are running:
Which was nice!
Note that we can check from any of the 3 VMs (incl storage VM), the list of nodes handled by SBD, as follows:
# sbd -d /dev/sdb list
Clear means up and running, according at the man pages for SBD:
# man sbd
“Nodes that are currently running should have a clear state; nodes that have been fenced, but not yet restarted, will show the appropriate fencing message.”
… we see also from the man pages, that from a cluster node we can send a test message to another node via SBD, so let’s try that just to see that it works:
# sbd -d /dev/sdb message helsinki-02 test
Then in helsinki-02 we can check the system log for the message:
# cat /var/log/messages | egrep "command test"
Next we do the test suggested in the SUSE doc, so start pinging from e.g. MacBook terminal the virtual IP address:
# ping 192.168.108.100
From Hawk we check in Resources tab that the virtual IP (“admin-ip”) is currently hosted on helsinki-01. Then in the Nodes tab we put helsinki-01 into Standby mode.
Some error notifications will show on Hawk dashboard about being unable to connect to server. Refresh the webpage, because this time the virtual IP will be redirected to helsinki-02 you may well get another SSL warning, if so click past that and then you can see that the virtual IP is now “hosted” by helsinki-02, and that helsinki-01 is on Standby.
Meanwhile the ping of the virtual IP was uninterrupted. So this test went well. Bring helsinki-01 online again by switching Standby off. Note that now there is no need for SBD to reassign the virtual IP, so it stays directed to helsinki-02.
From the Operations column there is an option (down-arrow → Fence) to manually fence either node, we can try that now for helsinki-02.
So we get kicked out of our SSH session as helsinki-02 gets rebooted basically. After it comes back up, login again and check that the system has only been up for a short while, in this example one minute:
And note that control of the virtual IP was transferred back to helsinki-01, since helsinki-02 was fenced.
Now let’s fence helsinki-01, and use sbd command to see the status changing as helsinki-01 goes offline, then reboots and rejoins the cluster:
Next we can run the preflight tests suggested in the SUSE doc.
So from a helsinki node, get the corosync metadata etc (that is a “minus one” at the end):
# crm_mon -1
We need to install the preflight-check script, and first we need to find it:
# cnf ha-cluster-preflight-check
It is in the package python3-cluster-preflight-check.
# zypper in python3-cluster-preflight-check # ha-cluster-preflight-check -e
Same test ok also from helsinki-02. Next try the cluster state check;
# ha-cluster-preflight-check -c
Same result from helsinki-02. There’s one warning-level message; however in Hawk our tests with fencing or manual shutdown seemed to work ok, so we won’t worry too much about this.
# ha-cluster-preflight-check --split-brain-iptables
(In this smoketest run, helsinki-01 got fenced).
I’m skipping the test that kills off various daemons, but let’s run the last test, manually fencing from the command line:
# ha-cluster-preflight-check --fence-node helsinki-02
One last test: shutdown all 3 VMs (storage VM last one down), then bring them back online (wait for storage VM to boot up first), then see if we can access the Hawk webpage and everything there looks fine – yep, it works.
That completes our smoketesting.
[If you want to change the behaviour of fenced nodes, so that instead of rejoining the cluster automatically when fenced then rebooted, a node waits patiently for an administrator to decide when it is healthy enough to be allowed to rejoin, then see week 2 unit 3 of the related openSAP course].
Appendix – reconfigure HA cluster to refer to SBD disk by-id
[This is one way to do it, probably there are more elegant solutions, anyway this method worked].
Before we can reconfigure, we need to stop a couple of processes. In helsinki-01, open Yast. High Availability -> Cluster. (If you are prompted to install needed package “corosync-qdevice” then you should press “Install”). Highlight the Services item and hit <Enter> to go to that config page. In the “Pacemaker and corosync start/stop”-box choose “Stop Now”; then Finish. Then run the ha-cluster-init script again:
# ha-cluster-init --name helsinkicluster
Answer “n” for No to the next 4 questions, like so:
csync2 is already configured – overwrite (y/n)? n
/etc/corosync/authkey already exists – overwrite (y/n)? n
/etc/corosync/corosync.conf already exists – overwrite (y/n)? n
/etc/pacemaker/authkey already exists – overwrite (y/n)? n
Then we answer “y” for Yes to the following 2 questions:
Do you wish to use SBD (y/n)? y
SBD is already configured to use /dev/sdb – overwrite (y/n)? y
Then we can provide the path to the shared disk using its disk by-id filename, for example:
Path to storage device (e.g. /dev/disk/by-id/…), or “none” for diskless sbd, use “;” as separator for multi path /dev/disk/by-id/scsi-360014055c008e13975049ec9b3289d27
And “y” to get past the warning:
WARNING: All data on /dev/disk/by-id/scsi-360014055c008e13975049ec9b3289d27 will be destroyed!
Are you sure you wish to use this device (y/n)? y
Hawk cluster interface is now running. To see cluster status, open:
Log in with username ‘hacluster’
Waiting for cluster…..done
ERROR: 1: cluster.init: Joined existing cluster – will not reconfigure.
I guess the last two messages mean that now helsinki-01 just rejoined the already existing cluster, so there is no need to reconfigure some kind of init script.
Then in helsinki-02, the same Yast steps to stop Pacemaker and corosync. Then we run:
…give the IP address of the “Corosync node of helsinki-01”, which in our case is still 192.168.84.2, then the script reads the SBD configuration (thus hopefully picking up the disk by-id reference).
Now that the reconfiguration is done, it’s worthwhile running the smoketests again – good luck now!