### 准备工作
所有节点:
- 系统: `CentOS7.6`
- 数据库: `MariaDB 5.5.60`
- VIP: `172.16.0.200`
node1节点:
- IP/主机:`172.16.0.100`
node2节点:
- IP/主机:`172.16.0.101`
网络配置如下图

 
### 安装Pacemaker和Corosync
#### 安装Pacemaker,Corosync,pcs
`node1和node2执行`
```bash
yum -y install corosync pacemaker pcs
```
#### 设置集群用户密码
`node1和node2执行`
```bash
echo "passwd" | passwd hacluster --stdin
```
启动和开启服务
`node1和node2执行`
```bash
systemctl start pcsd
systemctl enable pcsd
pcs cluster enable --all #配置集群服务开机启动
```
#### 配置Corosync
`node1执行`
认证用户hacluster,将授权tokens存储在文件/var/lib/pcsd/tokens中.
```bash
pcs cluster auth node1 node2 -u hacluster -p passwd
```
```
node1: Authorized
node2: Authorized
```
#### 生成和同步Corosync配置
`node1执行`
```bash
pcs cluster setup --name mysql_cluster node1 node2
```
#### 在所有节点启动集群
`node1执行`
```bash
pcs cluster start --all
```
### 安装DRBD
`node1和node2执行`
```bash
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum install -y kmod-drbd84 drbd84-utils
```
#### 配置DRBD
`node1和node2执行`
```bash
vim /etc/drbd.d/mysql01.res
```
```ini
resource mysql01 {
protocol C;
meta-disk internal;
device /dev/drbd0;
disk /dev/vdb; #/dev/vdb为空闲的块设备,可以LVM创建一个逻辑卷
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
}
net {
allow-two-primaries no;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
disk {
on-io-error detach;
}
syncer {
verify-alg sha1;
}
on node1 {
address 172.16.0.100:7789;
}
on node2 {
address 172.16.0.101:7789;
}
}
```
#### 初始化DRBD`(创建DRBD metadata)`
`node1和node2执行`
```bash
drbdadm create-md mysql01
```
#### 启动mysql01
`node1和node2执行`
```bash
drbdadm up mysql01
```
#### 指定主节点
`node1执行`
```bash
drbdadm primary --force mysql01
```
#### 查看drbd状态
`node1执行`
```bash
cat /proc/drbd
```
```
version: 8.4.11-1 (api:1/proto:86-101)
GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by mockbuild@, 2018-11-03 01:26:55
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:136 nr:288 dw:428 dr:13125 al:5 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
```
#### 快速同步
```bash
drbdadm new-current-uuid --clear-bitmap mysql01/0
```
#### 格式化drbd设备
等待上面主从的块设备同步(UpToDate/UpToDate)之后执行
格式化drbd成ext4格式
`node1执行`
```bash
mkfs.ext4 -m 0 -L drbd /dev/drbd0
tune2fs -c 30 -i 180d /dev/drbd0
```
#### 挂载drbd设备
`node1执行`
```bash
mount /dev/drbd0 /mnt
```
 
### 安装MariaDB
`node1和node2执行`
```bash
yum install -y mariadb-server mariadb
systemctl disable mariadb.service #设置开启不启动,通过pacemaker去管理
```
`node1执行`
```bash
systemctl start mariadb
```
#### 安装数据库
`node1执行`
```bash
mysql_install_db --datadir=/mnt --user=mysql
```
#### 执行安全安装
`node1执行`
```bash
mysql_secure_installation
```
#### 卸载drbd和停止数据库
`node1执行`
```bash
umount /mnt #卸载目录
systemctl stop mariadb #停止数据库
```
#### 配置mysql
`node1和node2执行`
```bash
vim /etc/my.cnf
```
```ini
[mysqld]
symbolic-links=0
bind_address = 0.0.0.0
datadir = /var/lib/mysql
pid_file = /var/run/mariadb/mysqld.pid
socket = /var/run/mariadb/mysqld.sock
[mysqld_safe]
bind_address = 0.0.0.0
datadir = /var/lib/mysql
pid_file = /var/run/mariadb/mysqld.pid
socket = /var/run/mariadb/mysqld.sock
!includedir /etc/my.cnf.d
```
 
### 配置Pacemaker集群
#### 配置逻辑和顺序如下
```bash
Start: mysql_fs01 -> mysql_service01 -> mysql_VIP01,
Stop: mysql_VIP01 -> mysql_service01 -> mysql_fs01.
```
mysql_fs01是文件系统资源,mysql_service01是服务资源,mysql_VIP01是浮动虚拟IP `172.16.0.200`
pcs具有的一个方便功能是能够将多个更改排入文件并以原子方式提交这些更改。为此,我们首先使用CIB中的当前原始XML配置填充文件
`node1执行`
```bash
pcs cluster cib clust_cfg
```
关闭STONITH(`注意:依赖具体的环境视情况操作`)
`node1执行`
```bash
pcs -f clust_cfg property set stonith-enabled=false
```
设置quorum策略为ignore
`node1执行`
```bash
pcs -f clust_cfg property set no-quorum-policy=ignore
```
防止资源在恢复后移动,因为它通常会增加停机时间
`node1执行`
```bash
pcs -f clust_cfg resource defaults resource-stickiness=200
```
为了达到这个效果,Pacemaker 有一个叫做“资源粘性值”的概念,它能够控制一个服务(资源)有多想呆在它正在运行的节点上。
Pacemaker为了达到最优分布各个资源的目的,默认设置这个值为0。我们可以为每个资源定义不同的粘性值,但一般来说,更改默认粘性值就够了。资源粘性表示资源是否倾向于留在当前节点,如果为正整数,表示倾向,负数则会离开,-inf表示负无穷,inf表示正无穷。
 
为drbd设备创建名为mysql_data01的集群资源和一个额外的克隆资源MySQLClone01,允许资源同时在两个集群节点上运行
`node1执行`
```bash
pcs -f clust_cfg resource create mysql_data01 ocf:linbit:drbd \
drbd_resource=mysql01 \
op monitor interval=30s
```
```bash
pcs -f clust_cfg resource master MySQLClone01 mysql_data01 \
master-max=1 master-node-max=1 \
clone-max=2 clone-node-max=1 \
notify=true
```
master-max: 可以将多少资源副本提升为主状态
master-node-max: 可以在单个节点上将多少个资源副本提升为主状态
clone-max: 要启动多少个资源副本。默认为群集中的节点数
clone-node-max: 可以在单个节点上启动多少个资源副本
notify: 停止或启动克隆副本时,请事先告知所有其他副本以及操作何时成功
 
为文件系统创建名为mysql_fs01的集群资源,告诉群集克隆资源MySQLClone01必须在与文件系统资源相同的节点上运行,并且必须在文件系统资源之前启动克隆资源。
`node1执行`
```bash
pcs -f clust_cfg resource create mysql_fs01 Filesystem \
device="/dev/drbd0" \
directory="/var/lib/mysql" \
fstype="ext4"
```
```bash
pcs -f clust_cfg constraint colocation add mysql_fs01 with MySQLClone01 \
INFINITY with-rsc-role=Master
```
```bash
pcs -f clust_cfg constraint order promote MySQLClone01 then start mysql_fs01
```
为MariaDB服务创建名为mysql_service01的集群资源。告诉群集MariaDB服务必须在与mysql_fs01文件系统资源相同的节点上运行,并且必须首先启动文件系统资源。
`node1执行`
```bash
pcs -f clust_cfg resource create mysql_service01 ocf:heartbeat:mysql \
binary="/usr/bin/mysqld_safe" \
config="/etc/my.cnf" \
datadir="/var/lib/mysql" \
pid="/var/lib/mysql/mysql.pid" \
socket="/var/lib/mysql/mysql.sock" \
additional_parameters="--bind-address=0.0.0.0" \
op start timeout=60s \
op stop timeout=60s \
op monitor interval=20s timeout=30s
```
```bash
pcs -f clust_cfg constraint colocation add mysql_service01 with mysql_fs01 INFINITY
```
```bash
pcs -f clust_cfg constraint order mysql_fs01 then mysql_service01
```
为虚拟IP 172.16.0.200创建名为mysql_VIP01的集群资源
`node1执行`
```bash
pcs -f clust_cfg resource create mysql_VIP01 ocf:heartbeat:IPaddr2 \
ip=172.16.0.200 cidr_netmask=32 \
op monitor interval=30s
```
当然,虚拟IP mysql_VIP01资源必须与MariaDB资源在同一节点上运行,并且必须在最后一个时启动。这是为了确保在连接到虚拟IP之前已经启动了所有其他资源。
`node1执行`
```bash
pcs -f clust_cfg constraint colocation add mysql_VIP01 with mysql_service01 INFINITY
```
```bash
pcs -f clust_cfg constraint order mysql_service01 then mysql_VIP01
```
检查配置
`node1执行`
```bash
pcs -f clust_cfg constraint
```
```
Location Constraints:
Ordering Constraints:
promote MySQLClone01 then start mysql_fs01 (kind:Mandatory)
start mysql_fs01 then start mysql_service01 (kind:Mandatory)
start mysql_service01 then start mysql_VIP01 (kind:Mandatory)
Colocation Constraints:
mysql_fs01 with MySQLClone01 (score:INFINITY) (with-rsc-role:Master)
mysql_service01 with mysql_fs01 (score:INFINITY)
mysql_VIP01 with mysql_service01 (score:INFINITY)
```
```bash
pcs -f clust_cfg resource show
```
```
Master/Slave Set: MySQLClone01 [mysql_data01]
Stopped: [ node1 node2 ]
mysql_fs01 (ocf::heartbeat:Filesystem): Stopped
mysql_service01 (ocf::heartbeat:mysql): Stopped
mysql_VIP01 (ocf::heartbeat:IPaddr2): Stopped
```
提交修改并查看集群状态
`node1执行`
```bash
pcs cluster cib-push clust_cfg
```
```bash
pcs status
```
```
Cluster name: mysql_cluster
Stack: corosync
Current DC: node1 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Mon Jul 29 06:51:22 2019
Last change: Mon Jul 29 02:49:38 2019 by root via cibadmin on node1
2 nodes configured
5 resources configured
Online: [ node1 node2 ]
Full list of resources:
Master/Slave Set: MySQLClone01 [mysql_data01]
Masters: [ node1 ]
Slaves: [ node2 ]
mysql_fs01 (ocf::heartbeat:Filesystem): Started node1
mysql_service01 (ocf::heartbeat:mysql): Started node1
mysql_VIP01 (ocf::heartbeat:IPaddr2): Started node1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
```
一旦配置提交,Pacemaker将会执行以下操作
- 在集群节点启动DRBD
- 选择一个节点提升为主节点
- 在同一个节点挂载文件系统,配置集群IP地址,启动MariaDB
- 开始监控资源
通过telenet虚拟IP地址和3306端口,测试MariaDB服务
`client执行`
```bash
telnet 172.16.0.200 3306
```
```
Trying 172.16.0.200...
Connected to 172.16.0.200.
Escape character is '^]'.
GHost '172.16.0.200' is not allowed to connect to this MariaDB serverConnection closed by foreign host.
```
### 常用命令汇总
查看集群状态:
```bash
pcs status
```
查看集群当前配置:
```bash
pcs config
```
开机后集群自启动:
```bash
pcs cluster enable --all
```
启动集群:
```bash
pcs cluster start --all
```
查看集群资源状态:
```bash
pcs resource show
```
验证集群配置情况:
```bash
crm_verify -L -V
```
测试资源配置:
```bash
pcs resource debug-start resource
```
设置节点为备用状态:
```bash
pcs cluster standby node1
```
列出集群属性
```bash
pcs property list
```
测试corosync成员
```bash
corosync-cmapctl | grep members
```
查看corosync成员
```bash
pcs status corosync
```
将集群配置保存到文件
```bash
pcs cluster cib filename
```
创建资源不应用到集群,写入到文件
```bash
pcs -f testfile1 resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.120 cidr_netmask=24 op monitor interval=30s
```
将文件配置应用到集群
```bash
pcs cluster cib-push filename
```
备份集群配置
```bash
pcs config backup filename
```
使用恢复集群配置
```bash
pcs config restore [--local] [filename] #--local只还原当前节点,没有执行filename则读取标准输入
```
添加集群节点
```bash
pcs cluster node add node
```
删除集群节点
```bash
pcs cluster node remove node
```
显示资源的参数
```bash
pcs resource describe standard:provider:type|type #例如:pcs resource describe ocf:heartbeat:IPaddr2
```
设置节点进入待机状态
```bash
pcs cluster standby node | --all
```
设置节点从待机状态移除
```bash
pcs cluster unstandby node | --all
```
删除集群配置(`警告:这个命令可永久移除已创建的集群配置`)
```bash
pcs cluster stop
pcs cluster destroy
```
参考:
https://www.lisenet.com/2016/activepassive-mysql-high-availability-pacemaker-cluster-with-drbd-on-centos-7/
https://linux.cn/article-3963-1.html
https://www.howtoforge.com/tutorial/how-to-set-up-nginx-high-availability-with-pacemaker-corosync-on-centos-7/
http://www.alexlinux.com/pacemaker-corosync-nginx-cluster/
https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/ch-clusteradmin-haar