### 环境准备
系统: `CentOS7`
软件: `MySQL5.7` `MHA 0.56`
VIP: `172.16.0.222`
服务器:
`node1(Master节点)`: `172.16.0.100`
`node2(SLave节点/Master备)`: `172.16.0.101`
`node3(SLave节点/MHA manager)`: `172.16.0.102`
 
### 初始化工作
**`a.三台机器SSH免密登陆`**
**`b.三台机器的主机名和设置hosts文件解析`**
**`c.两台SLAVE节点配置成MASTER节点的MySQL主从`**
**`d.MySQL的MHA配置要求`**
```bash
vim /etc/my.cnf
```
```ini
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
symbolic-links=0
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
bind-address = 0.0.0.0
#配置
server-id = 1 #节点的值唯一
log_bin = mysql-bin #MySQL Master或者备用Master
binlog_ignore_db = mysql
binlog_ignore_db = infomation_schema
binlog_ignore_db = performance_schema
#relay_log_purge = 0 #从库需要配置,执行完relay不删除
```
`提示:MySQL Master或者备用Master需要打开binlog,MHA要求各个数据库节点的复制过滤规则(binlog-do-db, replicate-ignore-db)都一样`
**`e. 开启半同步复制`**
`所有节点都执行下面命令`
```sql
mysql> install plugin rpl_semi_sync_master soname 'semisync_master.so'; #安装Master半同步复制插件
mysql> install plugin rpl_semi_sync_slave soname 'semisync_slave.so'; #安装Slave半同步复制插件
mysql> set global rpl_semi_sync_slave_enabled=on; #开启Slave半同步复制
mysql> set global rpl_semi_sync_master_enabled=on; #开启Master半同步复制
mysql> stop slave io_thread;
mysql> start slave io_thread;
mysql> show variables like '%semi%'; #查看半同步复制参数
mysql> show plugins; #查看加载的插件
```
**`f. MySQL配置GTID复制(可选)`**
参考:https://ynotes.cn/blog/article_detail/203 传统复制和GTID复制转换
 
### 安装MHA
下载MHA Manager & MHA Node
```bash
curl http://www.mysql.gr.jp/frame/modules/bwiki/index.php\?plugin\=attach\&pcmd\=open\&file\=mha4mysql-manager-0.56-0.el6.noarch.rpm\&refer\=matsunobu -o mha4mysql-manager-0.56-0.el6.noarch.rpm
curl http://www.mysql.gr.jp/frame/modules/bwiki/index.php\?plugin\=attach\&pcmd\=open\&file\=mha4mysql-node-0.56-0.el6.noarch.rpm\&refer\=matsunobu -o mha4mysql-node-0.56-0.el6.noarch.rpm
```
安装MHA Node
`所有节点`
```bash
yum localinstall -y mha4mysql-node-0.56-0.el6.noarch.rpm
```
安装MHA Manager
`node3(MHA manager节点)`
```bash
yum localinstall -y mha4mysql-manager-0.56-0.el6.noarch.rpm
```
 
### 配置MHA管理节点
`node3(MHA manager节点)`
创建mha相关目录
```bash
mkdir -p /usr/local/mha
mkdir -p /etc/mha
```
创建mha.conf
```bash
vim /etc/mha/mha.conf
```
```ini
[server default]
manager_log=/var/log/mha/manager.log
master_ip_failover_script=/etc/mha/scripts/master_ip_failover
master_ip_online_change_script=/etc/mha/scripts/master_ip_failover
# mysql user and password
user=mha_admin
password=123456
#ssh user
ssh_user=root
# working directory on the manager
manager_workdir=/usr/local/mha
# working directory on MySQL servers
remote_workdir=/usr/local/mha
repl_user=replica
repl_password=123456
[server1]
hostname=172.16.0.100
[server2]
hostname=172.16.0.101
[server3]
hostname=172.16.0.102
```
创建VIP切换脚本
```bash
vim /etc/mha/scripts/master_ip_failover
```
```perl
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '172.16.0.222/24'; # Virtual IP #设置VIP地址
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig eth1:$key $vip"; #针对网卡配置,如果使用的是eth0,则eth0:$key
my $ssh_stop_vip = "/sbin/ifconfig eth1:$key down"; #针对网卡配置,如果使用的是eth0,则eth0:$key
$ssh_user = "root";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
```
创建复制用户(replica)
`node1-2节点(Master节点/Master备)`
```sql
mysql> GRANT REPLICATION SLAVE ON *.* TO 'replica'@'172.16.0.%' IDENTIFIED BY '123456';
mysql> FLUSH PRIVILEGES;
```
创建管理用户(mha)
`所有节点`
```sql
mysql> GRANT ALL ON *.* TO 'mha'@'172.16.0.%' IDENTIFIED BY '123456';
mysql> FLUSH PRIVILEGES;
```
 
### Master配置VIP(第一次手动添加)
`node1节点`
```bash
/sbin/ifconfig eth1:1 172.16.0.222
```
 
### 测试节点间的SSH登录
`node3(MHA manager节点)`
```bash
masterha_check_ssh --conf=/etc/mha/mha.conf
```
```
Mon Aug 5 03:17:00 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Aug 5 03:17:00 2019 - [info] Reading application default configuration from /etc/mha/mha.conf..
Mon Aug 5 03:17:00 2019 - [info] Reading server configuration from /etc/mha/mha.conf..
Mon Aug 5 03:17:00 2019 - [info] Starting SSH connection tests..
Mon Aug 5 03:17:01 2019 - [debug]
Mon Aug 5 03:17:00 2019 - [debug] Connecting via SSH from root@172.16.0.100(172.16.0.100:22) to root@172.16.0.101(172.16.0.101:22)..
Mon Aug 5 03:17:00 2019 - [debug] ok.
Mon Aug 5 03:17:00 2019 - [debug] Connecting via SSH from root@172.16.0.100(172.16.0.100:22) to root@172.16.0.102(172.16.0.102:22)..
Mon Aug 5 03:17:01 2019 - [debug] ok.
Mon Aug 5 03:17:01 2019 - [debug]
Mon Aug 5 03:17:00 2019 - [debug] Connecting via SSH from root@172.16.0.101(172.16.0.101:22) to root@172.16.0.100(172.16.0.100:22)..
Mon Aug 5 03:17:01 2019 - [debug] ok.
Mon Aug 5 03:17:01 2019 - [debug] Connecting via SSH from root@172.16.0.101(172.16.0.101:22) to root@172.16.0.102(172.16.0.102:22)..
Mon Aug 5 03:17:01 2019 - [debug] ok.
Mon Aug 5 03:17:02 2019 - [debug]
Mon Aug 5 03:17:01 2019 - [debug] Connecting via SSH from root@172.16.0.102(172.16.0.102:22) to root@172.16.0.100(172.16.0.100:22)..
Mon Aug 5 03:17:01 2019 - [debug] ok.
Mon Aug 5 03:17:01 2019 - [debug] Connecting via SSH from root@172.16.0.102(172.16.0.102:22) to root@172.16.0.101(172.16.0.101:22)..
Mon Aug 5 03:17:02 2019 - [debug] ok.
Mon Aug 5 03:17:02 2019 - [info] All SSH connection tests passed successfully.
```
`上面显示ssh免密登陆配置成功`
### 检查mha配置
`node3(MHA manager节点)`
```bash
masterha_check_repl --conf=/etc/mha/mha.conf
```
```
Mon Aug 5 03:39:17 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Aug 5 03:39:17 2019 - [info] Reading application default configuration from /etc/mha/mha.conf..
Mon Aug 5 03:39:17 2019 - [info] Reading server configuration from /etc/mha/mha.conf..
Mon Aug 5 03:39:17 2019 - [info] MHA::MasterMonitor version 0.56.
Mon Aug 5 03:39:18 2019 - [info] GTID failover mode = 0
Mon Aug 5 03:39:18 2019 - [info] Dead Servers:
Mon Aug 5 03:39:18 2019 - [info] Alive Servers:
Mon Aug 5 03:39:18 2019 - [info] 172.16.0.100(172.16.0.100:3306)
Mon Aug 5 03:39:18 2019 - [info] 172.16.0.101(172.16.0.101:3306)
Mon Aug 5 03:39:18 2019 - [info] 172.16.0.102(172.16.0.102:3306)
Mon Aug 5 03:39:18 2019 - [info] Alive Slaves:
Mon Aug 5 03:39:18 2019 - [info] 172.16.0.101(172.16.0.101:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
Mon Aug 5 03:39:18 2019 - [info] Replicating from 172.16.0.100(172.16.0.100:3306)
Mon Aug 5 03:39:18 2019 - [info] 172.16.0.102(172.16.0.102:3306) Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
Mon Aug 5 03:39:18 2019 - [info] Replicating from 172.16.0.100(172.16.0.100:3306)
Mon Aug 5 03:39:18 2019 - [info] Current Alive Master: 172.16.0.100(172.16.0.100:3306)
Mon Aug 5 03:39:18 2019 - [info] Checking slave configurations..
Mon Aug 5 03:39:18 2019 - [info] read_only=1 is not set on slave 172.16.0.101(172.16.0.101:3306).
Mon Aug 5 03:39:18 2019 - [warning] relay_log_purge=0 is not set on slave 172.16.0.101(172.16.0.101:3306).
Mon Aug 5 03:39:18 2019 - [info] read_only=1 is not set on slave 172.16.0.102(172.16.0.102:3306).
Mon Aug 5 03:39:18 2019 - [warning] relay_log_purge=0 is not set on slave 172.16.0.102(172.16.0.102:3306).
Mon Aug 5 03:39:18 2019 - [info] Checking replication filtering settings..
Mon Aug 5 03:39:18 2019 - [info] binlog_do_db= , binlog_ignore_db= infomation_schema,mysql,performance_schema
Mon Aug 5 03:39:18 2019 - [info] Replication filtering check ok.
Mon Aug 5 03:39:18 2019 - [info] GTID (with auto-pos) is not supported
Mon Aug 5 03:39:18 2019 - [info] Starting SSH connection tests..
Mon Aug 5 03:39:20 2019 - [info] All SSH connection tests passed successfully.
Mon Aug 5 03:39:20 2019 - [info] Checking MHA Node version..
Mon Aug 5 03:39:21 2019 - [info] Version check ok.
Mon Aug 5 03:39:21 2019 - [info] Checking SSH publickey authentication settings on the current master..
Mon Aug 5 03:39:21 2019 - [info] HealthCheck: SSH to 172.16.0.100 is reachable.
Mon Aug 5 03:39:21 2019 - [info] Master MHA Node version is 0.56.
Mon Aug 5 03:39:21 2019 - [info] Checking recovery script configurations on 172.16.0.100(172.16.0.100:3306)..
Mon Aug 5 03:39:21 2019 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/usr/local/mha/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000008
Mon Aug 5 03:39:21 2019 - [info] Connecting to root@172.16.0.100(172.16.0.100:22)..
Creating /usr/local/mha if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000008
Mon Aug 5 03:39:22 2019 - [info] Binlog setting check done.
Mon Aug 5 03:39:22 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Aug 5 03:39:22 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_admin' --slave_host=172.16.0.101 --slave_ip=172.16.0.101 --slave_port=3306 --workdir=/usr/local/mha --target_version=5.7.27-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Mon Aug 5 03:39:22 2019 - [info] Connecting to root@172.16.0.101(172.16.0.101:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to db2-relay-bin.000002
Temporary relay log file is /var/lib/mysql/db2-relay-bin.000002
Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Mon Aug 5 03:39:22 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_admin' --slave_host=172.16.0.102 --slave_ip=172.16.0.102 --slave_port=3306 --workdir=/usr/local/mha --target_version=5.7.27-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Mon Aug 5 03:39:22 2019 - [info] Connecting to root@172.16.0.102(172.16.0.102:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to db3-relay-bin.000002
Temporary relay log file is /var/lib/mysql/db3-relay-bin.000002
Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Mon Aug 5 03:39:22 2019 - [info] Slaves settings check done.
Mon Aug 5 03:39:22 2019 - [info]
172.16.0.100(172.16.0.100:3306) (current master)
+--172.16.0.101(172.16.0.101:3306)
+--172.16.0.102(172.16.0.102:3306)
Mon Aug 5 03:39:22 2019 - [info] Checking replication health on 172.16.0.101..
Mon Aug 5 03:39:22 2019 - [info] ok.
Mon Aug 5 03:39:22 2019 - [info] Checking replication health on 172.16.0.102..
Mon Aug 5 03:39:22 2019 - [info] ok.
Mon Aug 5 03:39:22 2019 - [info] Checking master_ip_failover_script status:
Mon Aug 5 03:39:22 2019 - [info] /etc/mha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.0.100 --orig_master_ip=172.16.0.100 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1 172.16.0.222/24===
Checking the Status of the script.. OK
ssh: Could not resolve hostname cluster1: Name or service not known
Mon Aug 5 03:39:22 2019 - [info] OK.
Mon Aug 5 03:39:22 2019 - [warning] shutdown_script is not defined.
Mon Aug 5 03:39:22 2019 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
```
`MySQL Replication Health is OK.显示配置正确!`
### 启动MHA Manager
```bash
nohup masterha_manager --conf=/etc/mha/mha.conf &
```
### 验证是否启动成功
```bash
masterha_check_status --conf=/etc/mha/mha.conf
```
```
mha (pid:27057) is running(0:PING_OK), master:172.16.0.100
```
### 停止MHA Manager(`不执行`)
```bash
masterha_stop --conf=/etc/mha/mha.conf
```
 
### 测试
#### 测试故障转移
**停止node1节点的MySQL**
```bash
systemctl stop mysqld
```
**node2查看VIP**
```bash
ip a
```
```
...
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
link/ether 5a:00:02:32:0f:46 brd ff:ff:ff:ff:ff:ff
inet 172.16.0.101/16 brd 172.16.255.255 scope global eth1
valid_lft forever preferred_lft forever
inet 172.16.0.222/24 brd 172.16.0.255 scope global eth1:1
valid_lft forever preferred_lft forever
inet6 fe80::5800:2ff:fe32:f46/64 scope link
valid_lft forever preferred_lft forever
```
`通过上面的信息可以看到VIP已经转移到node2`
**查看是否为slave**
```sql
mysql> show slave status\G #没有输出,说明node2已经为非slave节点
Empty set (0.00 sec)
```
**查看node3的slave信息**
```sql
mysql> show slave status\G
```
```
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.0.101
Master_User: replica
Master_Port: 3306
Connect_Retry: 60
...
```
`通过上面的信息可以确定故障转移成功,master已经变成node2节点,node3节点已经同步到新的slave`
 
#### 旧master恢复的操作
`node1数据库恢复之后,首先设置为node2的slave,之后启动MHA Manager`
启动MySQL
`node1`
```bash
systemctl start mysqld
```
配置node1为node2的slave
_MHA节点查看同步信息_
`node3`
```bash
grep "CHANGE MASTER TO" /var/log/mha/manager.log |tail -1
```
```bash
Mon Aug 5 03:32:28 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.0.101', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000008', MASTER_LOG_POS=154, MASTER_USER='replica', MASTER_PASSWORD='xxx';
```
_node1配置同步node2_
`node1`
```sql
mysql> CHANGE MASTER TO MASTER_HOST='172.16.0.101', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000008', MASTER_LOG_POS=154, MASTER_USER='replica', MASTER_PASSWORD='123456';
mysql> start slave; #启动slave
mysql> show slave status\G
```
```
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.0.101
Master_User: replica
Master_Port: 3306
Connect_Retry: 60
...
```
把 Master切回到原理的node1
`node3`
```bash
masterha_master_switch --conf=/etc/mha/mha.conf --master_state=alive --new_master_host=172.16.0.100 --orig_master_is_new_slave
```
检查mha配置
`node3`
```bash
masterha_check_repl --conf=/etc/mha/mha.conf
```
```
...
MySQL Replication Health is OK.
```
启动MHA Manager
`node3`
```bash
nohup masterha_manager --conf=/etc/mha/mha.conf &
```
参考:
http://www.fblinux.com/?p=1018
https://joelhy.github.io/2015/02/06/mysql-mha/