一、MHA 简介
MHA(Master High Availability)目前在 MySQL 高可用方面是一个相对成熟的解决方案,
是一套优秀的作为 MySQL 高可用性环境下故障切换和主从角色提升的高可用软件。在 MySQL 故障切换过程中,MHA 能做到在 0~30 秒之内自动完成数据库的主从故障切换操作,并且在进行故障切换的过程中,MHA 能在最大程度上保证数据的一致性,以达到真正意义上的高可用。
MHA 由两部分组成:MHA Manager(管理节点)和 MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个 master-slave 集群,也可以部署在一台 slave 节点上。MHA Node 运行在每台 MySQL 服务器及 Manager 服务器上,MHA Manager 会定时探测集群中的 master 节点,当 master 出现故障时,它可以自动将拥有最新数据的 slave 提升为新的 master,然后将所有其他的 slave 重新指向新提升的 master。整个故障转移过程对应用程序层面完全透明。
在 MHA 自动故障切换过程中,MHA 会试图从宕机的主服务器上保存二进制日志,最大程度的保证数据不丢失,但这种操作是有概率性的。MHA 可以与半同步复制结合起来。如果只有一个 slave 已经收到了最新的二进制日志,MHA 可以将最新的二进制日志应用于其他所有的slave 服务器上,因此可以保证所有节点的数据一致性。
目前 MHA 主要支持一主多从的架构,要搭建 MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当 master,一台充当备用 master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝 TMHA 已经支持一主一从。
1、工作流程
- 从宕机崩溃的 master 上尝试保存二进制日志事件(binlog events);
- 识别含有最新更新的 slave 服务器;
- 应用差异的中继日志(relay log)到其他的 slave;
- 应用从 master 保存的二进制日志事件(binlog events);
- 提升一个 slave 为新的 master 服务器;
- 将其他的 slave 连接指向新的 master 进行主从复制
2、MHA 工具介绍
MHA 软件由两部分组成,Manager 工具包和 Node 工具包,具体的说明如下。
Manager 工具包主要包括以下几个工具:
- masterha_check_ssh 检查 MHA 的 SSH 配置状况
- masterha_check_repl 检查 MySQL 复制状况
- masterha_manger 启动 MHA
- masterha_check_status 检测当前 MHA 运行状态
- masterha_master_monitor 检测 master 是否宕机
- masterha_master_switch 控制故障转移(自动或者手动)
- masterha_conf_host 添加或删除配置的 server 信息
Node 工具包(这些工具通常由 MHA Manager 的脚本触发,无需人为操作)主要包括以下几个工具:
- save_binary_logs 保存和复制 master 的二进制日志
- apply_diff_relay_logs 识别差异的中继日志事件并将其差异的事件应用于其他的slave
- filter_mysqlbinlog 去除不必要的 ROLLBACK 事件(MHA 已不再使用这个工具)
- purge_relay_logs 清除中继日志(不会阻塞 SQL 线程)
二、部署MHA
数据库 xshell 配置文件
1、环境:
master1 主机:
hostname server01
bash
master2 主机:
hostname server02
bash
slave1 主机:
hostname server03
bash
slave2 主机:
hostname server04
bash
manager 主机:
hostname server05
bash
所有主机:
vim /etc/hosts
192.168.200.111 server01
192.168.200.112 server02
192.168.200.113 server03
192.168.200.114 server04
192.168.200.115 server05
systemctl stop iptables
systemctl stop firewalld
setenforce 0
2、安装MHA node
所有主机上传:
epel-release-latest-7.noarch.rpm
mha4mysql-node-0.56.tar.gz
perl-Config-Tiny-2.14-7.el7.noarch.rpm
安装epel源
rpm -ivh epel-release-latest-7.noarch.rpm
yum install -y perl-DBD-MySQL.x86_64 perl-DBI.x86_64 perl-CPAN perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker
安装MHA node
tar xf mha4mysql-node-0.56.tar.gz
cd mha4mysql-node-0.56/
perl Makefile.PL
make && make install
3、安装MHA Manger
server 05上传:
perl-Config-Tiny-2.14-7.el7.noarch.rpm
安装依赖包:
yum install -y perl perl-Log-Dispatch perl-Parallel-ForkManager perl-DBD-MySQL perl-DBI perl-Time-HiRes
yum -y install perl-Config-Tiny-2.14-7.el7.noarch.rpm
安装MHA Manger
tar xf mha4mysql-manager-0.56.tar.gz
cd mha4mysql-manager-0.56/
perl Makefile.PL
make && make install
4、配置ssh密钥对验证
server 01:
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.115
server 02:
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.115
server 03:
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.115
server 04:
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.115
server 05:
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114
注意:Server05 需要连接每个主机测试,因为第一次连接的时候需要输入 yes,影响后期故
障切换时,对于每个主机的 SSH 控制。
5、安装maria db
server1-4:
yum -y install mariadb mariadb-server mariadb-devel
systemctl start mariadb
netstat -lnpt | grep :3306
mysqladmin -u root password 123456 #后续操作中使用
6、搭建主从复制环境
server01:
vim /etc/my.cnf
[mysqld]
server-id = 1
log-bin=master-bin
log-slave-updates=true
relay_log_purge=0
systemctl restart mariadb
mysql -uroot -p123456
grant all on *.* to 'repl'@'192.168.200.%' identified by '123456';
flush privileges;
show master status;
grant all on *.* to 'root'@'192.168.200.%' identified by '123456'; #创建监控用户
grant all on *.* to 'root'@'server01' identified by '123456'; #为自己的主机名授权
flush privileges;
server02:
vim /etc/my.cnf
[mysqld]
server-id = 2
log-bin=master-bin
log-slave-updates=true
relay_log_purge=0
systemctl restart mariadb
mysql -uroot -p123456
stop slave;
CHANGE MASTER TO
MASTER_HOST='192.168.200.111',
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_LOG_FILE='master-bin.000001',
MASTER_LOG_POS=986;
start slave;
show slave status\G
# 检查 IO 和 SQL 线程是否为:yes
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
grant all on *.* to 'root'@'192.168.200.%' identified by '123456';
grant all on *.* to 'root'@'server02' identified by '123456';
flush privileges;
mysql -uroot -p123456 -e 'set global read_only=1'
server03:
vim /etc/my.cnf
[mysqld]
server-id = 3
log-bin=master-bin
log-slave-updates=true
relay_log_purge=0
systemctl restart mariadb
mysql -uroot -p123456
stop slave;
CHANGE MASTER TO
MASTER_HOST='192.168.200.111',
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_LOG_FILE='master-bin.000001',
MASTER_LOG_POS=474;
start slave;
show slave status\G
# 检查 IO 和 SQL 线程是否为:yes
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
grant all on *.* to 'root'@'192.168.200.%' identified by '123456';
grant all on *.* to 'root'@'server03' identified by '123456';
flush privileges;
mysql -uroot -p123456 -e 'set global read_only=1'
server04:
vim /etc/my.cnf
[mysqld]
server-id = 4
log-bin=master-bin
log-slave-updates=true
relay_log_purge=0
systemctl restart mariadb
mysql -uroot -p123456
stop slave;
CHANGE MASTER TO
MASTER_HOST='192.168.200.111',
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_LOG_FILE='master-bin.000001',
MASTER_LOG_POS=474;
start slave;
show slave status\G
# 检查 IO 和 SQL 线程是否为:yes
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
grant all on *.* to 'root'@'192.168.200.%' identified by '123456';
grant all on *.* to 'root'@'server04' identified by '123456';
flush privileges;
mysql -uroot -p123456 -e 'set global read_only=1'
7、配置MHA环境
server 05:
mkdir /etc/masterha
cp /root/mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/masterha
修改app1.cnf:
vim /etc/masterha/app1.cnf
[server default]
#设置 manager 的工作日志
manager_workdir=/var/log/masterha/app1
#设置 manager 的日志,这两条都是默认存在的
manager_log=/var/log/masterha/app1/manager.log
#设置 master 默认保存 binlog 的位置,以便 MHA 可以找到 master 日志
master_binlog_dir=/var/lib/mysql
#设置自动 failover 时候的切换脚本
master_ip_failover_script= /usr/local/bin/master_ip_failover
#设置 mysql 中 root 用户的密码
password=123456
user=root
#ping 包的时间间隔
ping_interval=1
#设置远端 mysql 在发生切换时保存 binlog 的具体位置
remote_workdir=/tmp
#设置复制用户的密码和用户名
repl_password=123456
repl_user=repl
[server1]
hostname=server01
port=3306
[server2]
hostname=server02
candidate_master=1
port=3306
check_repl_delay=0
[server3]
hostname=server03
port=3306
[server4]
hostname=server04
port=3306
配置故障转移脚本:
注意:脚本需要根据自己环境修改 ip 和网卡名称等
vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port,
);
my $vip = '192.168.200.100'; # 写入VIP
my $key = "1"; #非keepalived方式切换脚本使用的
my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down"; #那么这里写服务的开关命令
$ssh_user = "root";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
#eval {
# print "Disabling the VIP on old master: $orig_master_host \n";
# &stop_vip();
# $exit_code = 0;
#};
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
#my $ping=`ping -c 1 10.0.0.13 | grep "packet loss" | awk -F',' '{print $3}' | awk '{print $1}'`;
#if ( $ping le "90.0%"&& $ping gt "0.0%" ){
#$exit_code = 0;
#}
#else {
&stop_vip();
# updating global catalog, etc
$exit_code = 0;
#}
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
`ssh $ssh_user\@$orig_master_ip \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --
new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }
chmod +x /usr/local/bin/master_ip_failover
检查 MHA ssh 通信状态:
masterha_check_ssh --conf=/etc/masterha/app1.cnf
-----------------------------------忽略部分信息----------------------------------
Sat Dec 29 16:04:02 2018 - [info] All SSH connection tests passed successfully.
检查整个集群的状态 :
masterha_check_repl --conf=/etc/masterha/app1.cnf
-----------------------------------忽略部分信息-----------------------------------
Thu Aug 31 22:20:30 2017 - [info] Alive Servers:
Thu Aug 31 22:20:30 2017 - [info] server01(192.168.200.111:3306)
Thu Aug 31 22:20:30 2017 - [info] server02(192.168.200.112:3306)
Thu Aug 31 22:20:30 2017 - [info] server03(192.168.200.113:3306)
Thu Aug 31 22:20:30 2017 - [info] server04(192.168.200.114:3306)
-----------------------------------忽略部分信息-----------------------------------
server02 (current master)
+--server01
+--server03
+--server04
-----------------------------------忽略部分信息-----------------------------------
MySQL Replication Health is OK.
8、VIP 配置管理
开启 manager 监控
nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover< /dev/null >/var/log/masterha/app1/manager.log 2>&1 &
检查 manager 状态:
masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:65837) is running(0:PING_OK), master:server01
如果正常会显示"PING_OK",否则会显示"NOT_RUNNING",代表 MHA 监控没有开启。
查看启动日志:
cat /var/log/masterha/app1/manager.log
-----------------------------------忽略部分信息----------------------------------
Sat Dec 29 16:09:51 2018 - [info] Alive Servers:
Sat Dec 29 16:09:51 2018 - [info] server01(192.168.200.111:3306)
Sat Dec 29 16:09:51 2018 - [info] server02(192.168.200.112:3306)
Sat Dec 29 16:09:51 2018 - [info] server03(192.168.200.113:3306)
Sat Dec 29 16:09:51 2018 - [info] server04(192.168.200.114:3306)
-----------------------------------忽略部分信息----------------------------------
server01 (current master)
+--server02
+--server03
+--server04
-----------------------------------忽略部分信息----------------------------------
Thu Aug 31 21:55:23 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't
respond..
注意:其中"Ping(SELECT) succeeded, waiting until MySQL doesn't respond.."说明整个系统已经
开始监控了。
server01:
ip a | grep ens32
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group
default qlen 1000
inet 192.168.200.111/24 brd 192.168.200.255 scope global ens32
inet 192.168.200.100/24 brd 192.168.200.255 scope global secondary ens32:1
9、测试
模拟主库故障:
systemctl stop mariadb
netstat -lnpt | grep :3306
ip a | grep ens32
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group
default qlen 1000
inet 192.168.200.111/24 brd 192.168.200.255 scope global ens32
server02:
ip a | grep ens32
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group
default qlen 1000
inet 192.168.200.111/24 brd 192.168.200.255 scope global ens32
inet 192.168.200.100/24 brd 192.168.200.255 scope global secondary ens32:1
server03状态:
show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.200.112
Master_User: repl
server04状态:
show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.200.112
Master_User: rep
server05:
jobs
[1]+ 完成 nohup masterha_manager --conf=/etc/masterha/app1.cnf
--remove_dead_master_conf --ignore_last_failover < /dev/null >
/var/log/masterha/app1/manager.log 2>&1
10、故障主库修复及 VIP 切回测试
server01:
systemctl start mariadb
mysql -u root -p123456
stop slave;
CHANGE MASTER TO
MASTER_HOST='192.168.200.112',
MASTER_USER='repl',
MASTER_PASSWORD='123456';
start slave;
show slave status\G
server05:
vim /etc/masterha/app1.cnf
[server01]
hostname=server01
port=3306
masterha_check_repl --conf=/etc/masterha/app1.cnf
-----------------------------------忽略部分信息-----------------------------------
Thu Aug 31 22:20:30 2017 - [info] Alive Servers:
Thu Aug 31 22:20:30 2017 - [info] server01(192.168.200.111:3306)
Thu Aug 31 22:20:30 2017 - [info] server02(192.168.200.112:3306)
Thu Aug 31 22:20:30 2017 - [info] server03(192.168.200.113:3306)
Thu Aug 31 22:20:30 2017 - [info] server04(192.168.200.114:3306)
-----------------------------------忽略部分信息-----------------------------------
server02 (current master)
+--server01
+--server03
+--server04
-----------------------------------忽略部分信息-----------------------------------
MySQL Replication Health is OK.
nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover< /dev/null >/var/log/masterha/app1/manager.log 2>&1 &
server02:关闭现有主库 mysql
ip a | grep ens32
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group
default qlen 1000
inet 192.168.200.112/24 brd 192.168.200.255 scope global ens32
inet 192.168.200.100/24 brd 192.168.200.255 scope global secondary ens32:1
systemctl stop mariadb
netstat -lnpt | grep :3306
server01:
ip a | grep ens32
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group
default qlen 1000
inet 192.168.200.111/24 brd 192.168.200.255 scope global ens32
inet 192.168.200.100/24 brd 192.168.200.255 scope global secondary ens32:1
server03状态:
show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.200.112
Master_User: repl
server04状态:
show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.200.112
Master_User: rep
server02:
systemctl start mariadb
mysql -u root -p123456
stop slave;
CHANGE MASTER TO
MASTER_HOST='192.168.200.111',
MASTER_USER='repl',
MASTER_PASSWORD='123456';
start slave;
show slave status\G
show slave status\G
server05:
vim /etc/masterha/app1.cnf
[server2]
hostname=server02
candidate_master=1
port=3306
masterha_check_repl --conf=/etc/masterha/app1.cnf
server01 (current master)
+--server02
+--server03
+--server04
-----------------------------------忽略部分信息-----------------------------------
MySQL Replication Health is OK.