HA Solution Overview
While a xCAT management node xcatmn1
is running as a primary management node, another node - xcatmn2
can be configured to act as primary management node in case xcatmn1
becomes unavailable. The process is manual and requires disabling primary xcatmn1
and activating backup xcatmn2
. Both nodes require access to shared storage described below. Use of Virtual IP is also required.
An interactive sample script xcatha.py is available to guide through the steps of disabling and activation of xCAT management nodes. Dryrun
option in that scrip allows viewing the actions without executing them.
Configure and Activate Primary xCAT Management Node
Configure Virtual IP
Existing xCAT management node IP should be configured as Virtual IP address, the Virtual IP address should be non-persistent, it needs to be re-configured right after the management node is rebooted. This non-persistent Virtual IP address is designed to avoid ip address conflict when the original primary management node is recovered with this Virtual IP address configured. Since the Virtual IP is non-persistent, the network interface should have a persistent IP address.
Configure another IP on primary management node for network interface as static IP, for example,
10.5.106.70
:Configure
10.5.106.70
as static IP:ip addr add 10.5.106.70/8 dev eth0
Edit
ifcfg-eth0
file as:DEVICE="eth0" BOOTPROTO="static" NETMASK="255.0.0.0" IPADDR="10.5.106.70" ONBOOT="yes"
If want to take new static ip effect immediately, login
xcatmn1
using10.5.106.70
, and restart network service, then add original static IP on primary management node10.5.106.7
as Virtual IPssh 10.5.106.70 -l root service network restart ip addr add 10.5.106.7/8 brd + dev eth0 label eth0:0
Add
10.5.106.70
intopostgresql
configuration file on primary management nodeAdd
10.5.106.70
into/var/lib/pgsql/data/pg_hba.conf
:host all all 10.5.106.7/32 md5
Add
10.5.106.70
intolisten_addresses
variable in/var/lib/pgsql/data/postgresql.conf
:listen_addresses = 'localhost,10.5.106.7,10.5.106.70'
Modify provision network entry
mgtifname
aseth0:0
on primary management node:tabedit networks "10_0_0_0-255_0_0_0","10.0.0.0","255.0.0.0","eth0:0","10.0.0.103",,"<xcatmaster>",,,,,,,,,,,"1500",,
Synchronize /etc/hosts
Since the /etc/hosts
is used by xCAT commands, the /etc/hosts
should be synchronized between the primary management node and backup management node.
Synchronize Clock
It is recommended that the clocks are synchronized between the primary management node and backup management node.
Activate Primary xCAT Management Node
Use xcatha.py
interactive activate xcatmn1
:
./xcatha.py -a
[Admin] Verify VIP 10.5.106.7 is configured on this node
Continue? [[Y]es/[N]o]:
Y
[Admin] Verify that the following is configured to be saved in shared storage and accessible from this node:
... /install
... /etc/xcat
... /root/.xcat
... /var/lib/pgsql
... /tftpboot
Continue? [[Y]es/[N]o]:
Y
[xCAT] Starting up services:
... postgresql
... xcatd
... named
... dhcpd
... ntpd
... conserver
... goconserver
Continue? [[Y]es/[N]o/[D]ryrun]:
Y
2018-06-24 22:13:09,428 - INFO - ===> Start all services stage <===
2018-06-24 22:13:10,559 - DEBUG - systemctl start postgresql [Passed]
2018-06-24 22:13:13,298 - DEBUG - systemctl start xcatd [Passed]
domain=cluster.com
2018-06-24 22:13:13,715 - DEBUG - lsdef -t site -i domain|grep domain [Passed]
Handling bybc0607 in /etc/hosts.
Handling localhost in /etc/hosts.
Handling bybc0609 in /etc/hosts.
Handling localhost in /etc/hosts.
Getting reverse zones, this may take several minutes for a large cluster.
Completed getting reverse zones.
Updating zones.
Completed updating zones.
Restarting named
Restarting named complete
Updating DNS records, this may take several minutes for a large cluster.
Completed updating DNS records.
DNS setup is completed
2018-06-24 22:13:17,320 - DEBUG - makedns -n [Passed]
Renamed existing dhcp configuration file to /etc/dhcp/dhcpd.conf.xcatbak
Warning: No dynamic range specified for 10.0.0.0. If hardware discovery is being used, a dynamic range is required.
2018-06-24 22:13:17,811 - DEBUG - makedhcp -n [Passed]
2018-06-24 22:13:18,746 - DEBUG - makedhcp -a [Passed]
2018-06-24 22:13:18,800 - DEBUG - systemctl start ntpd [Passed]
2018-06-24 22:13:19,353 - DEBUG - makeconservercf [Passed]
2018-06-24 22:13:19,449 - DEBUG - systemctl start conserver [Passed]
Activate Backup xCAT Management Node to be Primary Management Node
Install xCAT on backup xCAT management node
xcatmn2
with local diskSwitch to
PostgreSQL
databaseDisable and deactivate services using
xcatha.py -d
on bothxcatmn2
andxcatmn1
Remove Virtual IP from primary xCAT Management Node
xcatmn1
:ip addr del 10.5.106.7/8 dev eth0:0
Configure Virtual IP on
xcatmn2
Add Virtual IP into
/etc/hosts
file10.5.106.7 xcatmn1 xcatmn1.cluster.com
Connect the following xCAT directories to shared data on
xcatmn2
:/etc/xcat /install ~/.xcat /var/lib/pgsql /tftpboot
Add static management node network interface IP
10.5.106.5
intoPostgreSQL
configuration fileAdd
10.5.106.5
into/var/lib/pgsql/data/pg_hba.conf
:host all all 10.5.106.5/32 md5
Add
10.5.106.5
intolisten_addresses
variable in/var/lib/pgsql/data/postgresql.conf
:listen_addresses = 'localhost,10.5.106.7,10.5.106.70,10.5.105.5'
Use
xcatha.py -a
to start all related services onxcatmn2
Modify provision network entry
mgtifname
aseth0:0
:tabedit networks "10_0_0_0-255_0_0_0","10.0.0.0","255.0.0.0","eth0:0","10.0.0.103",,"<xcatmaster>",,,,,,,,,,,"1500",,
Unplanned failover: primary xCAT management node is not accessible
If primary xCAT management node becomes not accessible before being deactivated and backup xCAT management node is activated, it is recommended that the primary node is disconnected from the network before being rebooted. This will ensure that when services are started on reboot, they do not interfere with the same services running on the backup xCAT management node.