一、工作流程
(1)从宕机崩溃的 master 上尝试保存二进制日志事件(binlog events);
(2)识别含有最新更新的 slave 服务器;
(3)应用差异的中继日志(relay log)到其他的 slave;
(4)应用从 master 保存的二进制日志事件(binlog events);
(5)提升一个 slave 为新的 master 服务器;
(6)将其他的 slave 连接指向新的 master 进行主从复制;
二、实验环境
1、注意安装环境,安装工具软件包net-tools
2、
主机名 | IP地址 | 角色 | serverID | 数据库类型 |
server01 | 192.168.200.111 | primary master1 | 1 | 写入 |
server02 | 192.168.200.112 | secondary master2 | 2 | 写入 |
server03 | 192.168.200.113 | slave1 | 3 | 读取 |
server04 | 192.168.200.114 | slave2 | 4 | 读取 |
server05 | 192.168.200.115 | manager | 监控复制组 |
其中primary master对外提供写服务,备选secondary master实际相当于slave,提供读取服务,salve1和slave2也提供相关读服务,一旦primary master宕机,将会把备选secondary master提升为新的primary master,slave1和slave2指向新的master。
三、前期环境部署
1、配置所有主机名称
master1 主机: hostname server01 bash master2 主机: hostname server02 bash slave1 主机: hostname server03 bash slave2 主机: hostname server04 bash manager 主机: hostname server05 bash
2、配置所有主机的主机名与IP地址的映射关系
vim /etc/hosts 192.168.200.111 server01 192.168.200.112 server02 192.168.200.113 server03 192.168.200.114 server04 192.168.200.115 server05
3、所有主机关闭防火墙与linux安全机制
systemctl stop firewalld iptables -F setenforce 0
4、安装在线yum源与epel源(epel-release)
[root@server01 yum.repos.d]# ls a CentOS7-Base-163_(1).repo [root@server01 yum.repos.d]# yum -y install epel-release [root@server01 yum.repos.d]# ls a CentOS7-Base-163_(1).repo epel.repo epel-testing.repo 修改配置文件 [root@server01 yum.repos.d]# vim epel.repo
或
rpm -ivh epel-release-latest-7.noarch.rpm
四、所有主机安装MHA node
1、安装所支持的软件及相关的perl依赖包
yum install -y perl-DBD-MySQL.x86_64 perl-DBI.x86_64 perl-CPAN perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker ------------------------------------------------------------------------------ 查看是否安装成功 rpm -q perl-DBD-MySQL.x86_64 perl-DBI.x86_64 perl-CPAN perl-ExtUtils-CBuilder 安装成功反馈结果 perl-DBD-MySQL-4.023-6.el7.x86_64 perl-DBI-1.627-4.el7.x86_64 perl-CPAN-1.9800-294.el7_6.noarch perl-ExtUtils-CBuilder-0.28.2.6-294.el7_6.noarch
2、上传、解压、安装node
tar xf mha4mysql-node-0.56.tar.gz cd mha4mysql-node-0.56/ perl Makefile.PL make && make install
3、安装成功后,在/usr/local/bin下生成四个脚本
ls -l /usr/local/bin/ 总用量 40 -r-xr-xr-x. 1 root root 16346 10月 22 14:42 apply_diff_relay_logs -r-xr-xr-x. 1 root root 4807 10月 22 14:42 filter_mysqlbinlog -r-xr-xr-x. 1 root root 7401 10月 22 14:42 purge_relay_logs -r-xr-xr-x. 1 root root 7395 10月 22 14:42 save_binary_logs
五、在manager主机上安装MHA Manager
1、安装依赖软件
yum install -y perl perl-Log-Dispatch perl-Parallel-ForkManager perl-DBD-MySQL perl-DBI perl-Time-HiRes -------------------------------------------------------- yum -y install perl-Config-Tiny-2.14-7.el7.noarch.rpm -------------------------------------------------------- 检查是否安装成功 rpm -q perl perl-Log-Dispatch perl-Parallel-ForkManager perl-DBD-MySQL perl-DBI perl-5.16.3-294.el7_6.x86_64 perl-Log-Dispatch-2.41-1.el7.1.noarch perl-Parallel-ForkManager-1.18-2.el7.noarch perl-DBD-MySQL-4.023-6.el7.x86_64 perl-DBI-1.627-4.el7.x86_64
2、上传、解压、安装MHA Manager软件包
tar xf mha4mysql-manager-0.56.tar.gz cd mha4mysql-manager-0.56/ perl Makefile.PL make && make install
六、配置ssh双向密钥对
服务器先生成一个密钥对,把自己的公钥传给对方
1、server05(192.168.200.114)
ssh-keygen -t rsa ssh-copy-id server01 ssh-copy-id server02 ssh-copy-id server03 ssh-copy-id server04 ----------------------------------------- 或 ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111 ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112 ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113 ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114 注意:Server05需要连接每个主机测试,因为第一次连接的时候需要输入yes,影响后期故障切换时,对于每个主机的SSH控制。
ssh server01 ssh server02 ssh server03 ssh server04-------------------------------------或ssh root@192.168.200.111ssh root@192.168.200.112ssh root@192.168.200.113ssh root@192.168.200.114
2、server04(192.168.200.114)
ssh-keygen -t rsa ssh-copy-id server01 ssh-copy-id server02 ssh-copy-id server03 ------------------------------------------------------ 或 ssh-keygen -t rsa ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111 ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112 ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113
3、server03(192.168.200.113)
ssh-keygen -t rsa ssh-copy-id server01 ssh-copy-id server02 ssh-copy-id server04 ------------------------------------------------------------ 或 ssh-keygen -t rsa ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111 ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112 ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114
4、server02(192.168.200.112)
ssh-keygen -t rsa ssh-copy-id server01 ssh-copy-id server03 ssh-copy-id server04 ----------------------------------------------------------------------- ssh-keygen -t rsa ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.111 ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113 ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114
5、server01(192.168.200.111)
ssh-keygen -t rsa ssh-copy-id server02 ssh-copy-id server03 ssh-copy-id server04 ----------------------------------------------------------------- 或 ssh-keygen -t rsa ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.112 ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.113 ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.200.114
七、安装mysql
1、server01-server04主机上
yum -y install mariadb mariadb-server mariadb-devel//启动服务 systemctl start mariadb//查看端口 netstat -lnpt | grep :3306
2、给数据库设置初始密码(后面会用到)
mysqladmin -u root password 123456
八、搭建主从复制环境
1、修改mysql配置文件
server01
vim /etc/my.cnf[mysqld] server-id = 1 log-bin=master-bin log-slave-updates=true relay_log_purge=0systemctl restart mariadb
server02
vim /etc/my.cnf [mysqld] server-id=2 log-bin=master-bin log-slave-updates=true relay_log_purge=0systemctl restart mariadb
server03
vim /etc/my.cnf [mysqld] server-id=3 log-bin=mysql-bin relay-log=slave-relay-bin log-slave-updates=true relay_log_purge=0 systemctl restart mariadb
server04
vim /etc/my.cnf [mysqld] server-id=4 log-bin=mysql-bin relay-log=slave-relay-bin log-slave-updates=true relay_log_purge=0 systemctl restart mariadb
2、所有mysql服务器创建复制授权用户
grant replication slave on *.* to 'repl'@'192.168.200.%' identified by '123456'; flush privileges;
3、查看主库备份时的binlog名称和文件(192.168.200.111 server01)
show master status; +-------------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-------------------+----------+--------------+------------------+ | master-bin.000001 | 474 | | | +-------------------+----------+--------------+------------------+ 1 row in set (0.00 sec)
4、从数据库启动slave(192.168.200.112-114 server02-server04)
stop slave; CHANGE MASTER TO MASTER_HOST='192.168.200.111', MASTER_USER='repl', MASTER_PASSWORD='123456', MASTER_LOG_FILE='master-bin.000001', MASTER_LOG_POS=474; start slave; show slave status\G Slave_IO_Running: Yes Slave_SQL_Running: Yes
5、三台 slave 服务器设置 read_only 状态(server02-server04)
从库对外只提供读服务,只所以没有写进 mysql 配置文件,是因为随时 server02 会提升为 master。
[root@server02 ~]# mysql -uroot -p123456 -e 'set global read_only=1' [root@server03 ~]# mysql -uroot -p123456 -e 'set global read_only=1' [root@server04 ~]# mysql -uroot -p123456 -e 'set global read_only=1'
6、创建监控用户(server01-server04)
grant all privileges on *.* to 'root'@'192.168.200.%' identified by '123456'; flush privileges;
7、为自己的主机名授权
server01
grant all privileges on *.* to 'root'@'server01' identified by '123456'; flush privileges;
server02
grant all privileges on *.* to 'root'@'server02' identified by '123456'; flush privileges;
server03
grant all privileges on *.* to 'root'@'server03' identified by '123456'; flush privileges;
server04
grant all privileges on *.* to 'root'@'server04' identified by '123456'; flush privileges;
九、配置MHA环境
server05(192.168.200.115)在软件包解压后的目录里面有样配置文件
1、创建MHA的工作目录及相关配置文件
mkdir /etc/masterha cp mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/masterha
2、修改app1.cnf配置文件
vim /etc/masterha/app1.cnf [server default] manager_workdir=/var/log/masterha/app1 #设置manager工作日志 manager_log=/var/log/masterha/app1/manager.log #设置manager的日志,这两条都是默认存在的 master_binlog_dir=/var/lib/mysql #设置master默认保存binlog的位置,以便MHA可以找到master日志 master_ip_failover_script= /usr/local/bin/master_ip_failover #设置自动failover时候的切换脚本 password=123456 #设置mysql中root用户的密码 user=root ping_interval=1 #ping包的时间间隔 remote_workdir=/tmp #设置远端mysql在发生切换时保存binlog的具体位置 repl_password=123456 #设置复制用户的密码和用户名 repl_user=repl [server1] hostname=server01 port=3306 [server2] hostname=server02 candidate_master=1 #server02有可能会被提升为新的主 port=3306 check_repl_delay=0 [server3] hostname=server03 port=3306 [server4] hostname=server04 port=3306
3、配置故障转移脚本
vim /usr/local/bin/master_ip_failover #!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port, ); my $vip = '192.168.200.100'; # 写入VIP my $key = "1"; #非keepalived方式切换脚本使用的 my $ssh_start_vip = "/sbin/ifconfig eno16777728:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig eno16777728:$key down"; #那么这里写服务的开关命令 $ssh_user = "root"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; #eval { # print "Disabling the VIP on old master: $orig_master_host \n"; # &stop_vip(); # $exit_code = 0; #}; eval { print "Disabling the VIP on old master: $orig_master_host \n"; #my $ping=`ping -c 1 10.0.0.13 | grep "packet loss" | awk -F',' '{print $3}' | awk '{print $1}'`; #if ( $ping le "90.0%"&& $ping gt "0.0%" ){ #$exit_code = 0; #} #else { &stop_vip(); # updating global catalog, etc $exit_code = 0; #} }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; `ssh $ssh_user\@$orig_master_ip \" $ssh_start_vip \"`; exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port -- new_master_host=host --new_master_ip=ip --new_master_port=port\n"; } #给脚本加执行权限 chmod +x /usr/local/bin/master_ip_failover
4、设置从库relay log的清除方式(server02-server04)
mysql -uroot -p123456 -e 'set global relay_log_purge=0;'
5、检查MHA ssh通信状态
masterha_check_ssh --conf=/etc/masterha/app1.cnf Wed Oct 23 07:05:41 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Oct 23 07:05:41 2019 - [info] Reading application default configurations from /etc/masterha/app1.cnf.. Wed Oct 23 07:05:41 2019 - [info] Reading server configurations from /etc/masterha/app1.cnf.. Wed Oct 23 07:05:41 2019 - [info] Starting SSH connection tests.. Wed Oct 23 07:05:44 2019 - [debug] Wed Oct 23 07:05:41 2019 - [debug] Connecting via SSH from root@server01(192.168.200.111:22) to root@server02(192.168.200.112:22).. Wed Oct 23 07:05:42 2019 - [debug] ok. Wed Oct 23 07:05:42 2019 - [debug] Connecting via SSH from root@server01(192.168.200.111:22) to root@server03(192.168.200.113:22).. Wed Oct 23 07:05:43 2019 - [debug] ok. Wed Oct 23 07:05:43 2019 - [debug] Connecting via SSH from root@server01(192.168.200.111:22) to root@server04(192.168.200.114:22).. Wed Oct 23 07:05:44 2019 - [debug] ok. Wed Oct 23 07:05:45 2019 - [debug] Wed Oct 23 07:05:42 2019 - [debug] Connecting via SSH from root@server03(192.168.200.113:22) to root@server01(192.168.200.111:22).. Wed Oct 23 07:05:43 2019 - [debug] ok. Wed Oct 23 07:05:43 2019 - [debug] Connecting via SSH from root@server03(192.168.200.113:22) to root@server02(192.168.200.112:22).. Wed Oct 23 07:05:44 2019 - [debug] ok. Wed Oct 23 07:05:44 2019 - [debug] Connecting via SSH from root@server03(192.168.200.113:22) to root@server04(192.168.200.114:22).. Wed Oct 23 07:05:45 2019 - [debug] ok. Wed Oct 23 07:05:45 2019 - [debug] Wed Oct 23 07:05:42 2019 - [debug] Connecting via SSH from root@server02(192.168.200.112:22) to root@server01(192.168.200.111:22).. Wed Oct 23 07:05:43 2019 - [debug] ok. Wed Oct 23 07:05:43 2019 - [debug] Connecting via SSH from root@server02(192.168.200.112:22) to root@server03(192.168.200.113:22).. Wed Oct 23 07:05:44 2019 - [debug] ok. Wed Oct 23 07:05:44 2019 - [debug] Connecting via SSH from root@server02(192.168.200.112:22) to root@server04(192.168.200.114:22).. Wed Oct 23 07:05:45 2019 - [debug] ok. Wed Oct 23 07:05:46 2019 - [debug] Wed Oct 23 07:05:43 2019 - [debug] Connecting via SSH from root@server04(192.168.200.114:22) to root@server01(192.168.200.111:22).. Wed Oct 23 07:05:44 2019 - [debug] ok. Wed Oct 23 07:05:44 2019 - [debug] Connecting via SSH from root@server04(192.168.200.114:22) to root@server02(192.168.200.112:22).. Wed Oct 23 07:05:44 2019 - [debug] ok. Wed Oct 23 07:05:44 2019 - [debug] Connecting via SSH from root@server04(192.168.200.114:22) to root@server03(192.168.200.113:22).. Wed Oct 23 07:05:45 2019 - [debug] ok. Wed Oct 23 07:05:46 2019 - [info] All SSH connection tests passed successfully. 出现这个结果说明执行成功
6、检查整个集群的状态
[root@server05 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf Wed Oct 23 07:10:22 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Oct 23 07:10:22 2019 - [info] Reading application default configurations from /etc/masterha/app1.cnf.. Wed Oct 23 07:10:22 2019 - [info] Reading server configurations from /etc/masterha/app1.cnf.. Wed Oct 23 07:10:22 2019 - [info] MHA::MasterMonitor version 0.56. Wed Oct 23 07:10:23 2019 - [info] Dead Servers: Wed Oct 23 07:10:23 2019 - [info] Alive Servers: Wed Oct 23 07:10:23 2019 - [info] server01(192.168.200.111:3306) Wed Oct 23 07:10:23 2019 - [info] server02(192.168.200.112:3306) Wed Oct 23 07:10:23 2019 - [info] server03(192.168.200.113:3306) Wed Oct 23 07:10:23 2019 - [info] server04(192.168.200.114:3306) Wed Oct 23 07:10:23 2019 - [info] Alive Slaves: Wed Oct 23 07:10:23 2019 - [info] server02(192.168.200.112:3306) Version=5.5.64-MariaDB (oldest major version between slaves) log-bin:enabled Wed Oct 23 07:10:23 2019 - [info] Replicating from 192.168.200.111(192.168.200.111:3306) Wed Oct 23 07:10:23 2019 - [info] Primary candidate for the new Master (candidate_master is set) Wed Oct 23 07:10:23 2019 - [info] server03(192.168.200.113:3306) Version=5.5.64-MariaDB (oldest major version between slaves) log-bin:enabled Wed Oct 23 07:10:23 2019 - [info] Replicating from 192.168.200.111(192.168.200.111:3306) Wed Oct 23 07:10:23 2019 - [info] server04(192.168.200.114:3306) Version=5.5.64-MariaDB (oldest major version between slaves) log-bin:enabled Wed Oct 23 07:10:23 2019 - [info] Replicating from 192.168.200.111(192.168.200.111:3306) Wed Oct 23 07:10:23 2019 - [info] Current Alive Master: server01(192.168.200.111:3306) Wed Oct 23 07:10:23 2019 - [info] Checking slave configurations.. Wed Oct 23 07:10:23 2019 - [info] Checking replication filtering settings.. Wed Oct 23 07:10:23 2019 - [info] binlog_do_db= , binlog_ignore_db= Wed Oct 23 07:10:23 2019 - [info] Replication filtering check ok. Wed Oct 23 07:10:23 2019 - [info] Starting SSH connection tests.. Wed Oct 23 07:10:28 2019 - [info] All SSH connection tests passed successfully. Wed Oct 23 07:10:28 2019 - [info] Checking MHA Node version.. Wed Oct 23 07:10:29 2019 - [info] Version check ok. Wed Oct 23 07:10:29 2019 - [info] Checking SSH publickey authentication settings on the current master.. Wed Oct 23 07:10:29 2019 - [info] HealthCheck: SSH to server01 is reachable. Wed Oct 23 07:10:30 2019 - [info] Master MHA Node version is 0.56. Wed Oct 23 07:10:30 2019 - [info] Checking recovery script configurations on the current master.. Wed Oct 23 07:10:30 2019 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000001 Wed Oct 23 07:10:30 2019 - [info] Connecting to root@server01(server01).. Creating /tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /var/lib/mysql, up to master-bin.000001 Wed Oct 23 07:10:30 2019 - [info] Master setting check done. Wed Oct 23 07:10:30 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Wed Oct 23 07:10:30 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server02 --slave_ip=192.168.200.112 --slave_port=3306 --workdir=/tmp --target_version=5.5.64-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx Wed Oct 23 07:10:30 2019 - [info] Connecting to root@192.168.200.112(server02:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002 Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Wed Oct 23 07:10:31 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server03 --slave_ip=192.168.200.113 --slave_port=3306 --workdir=/tmp --target_version=5.5.64-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx Wed Oct 23 07:10:31 2019 - [info] Connecting to root@192.168.200.113(server03:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to slave-relay-bin.000002 Temporary relay log file is /var/lib/mysql/slave-relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Wed Oct 23 07:10:31 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server04 --slave_ip=192.168.200.114 --slave_port=3306 --workdir=/tmp --target_version=5.5.64-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx Wed Oct 23 07:10:31 2019 - [info] Connecting to root@192.168.200.114(server04:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to slave-relay-bin.000002 Temporary relay log file is /var/lib/mysql/slave-relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Wed Oct 23 07:10:31 2019 - [info] Slaves settings check done. Wed Oct 23 07:10:31 2019 - [info] server01 (current master) +--server02 +--server03 +--server04 Wed Oct 23 07:10:31 2019 - [info] Checking replication health on server02.. Wed Oct 23 07:10:31 2019 - [info] ok. Wed Oct 23 07:10:31 2019 - [info] Checking replication health on server03.. Wed Oct 23 07:10:31 2019 - [info] ok. Wed Oct 23 07:10:31 2019 - [info] Checking replication health on server04.. Wed Oct 23 07:10:31 2019 - [info] ok. Wed Oct 23 07:10:31 2019 - [info] Checking master_ip_failover_script status: Wed Oct 23 07:10:31 2019 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=server01 --orig_master_ip=192.168.200.111 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eno16777728:1 down==/sbin/ifconfig eno16777728:1 192.168.200.100=== Checking the Status of the script.. OK bash: /sbin/ifconfig: 没有那个文件或目录 Wed Oct 23 07:10:32 2019 - [info] OK. Wed Oct 23 07:10:32 2019 - [warning] shutdown_script is not defined. Wed Oct 23 07:10:32 2019 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
十、VIP配置管理
1、检查/etc/masterha/app1.cnf文件,检查如下行是否正确
[root@server05 ~]# grep -n 'master_ip_failover_script' /etc/masterha/app1.cnf 7:master_ip_failover_script= /usr/local/bin/master_ip_failover
2、查看故障转移脚本,确定网卡配置正确
[root@server05 ~]# head -13 /usr/local/bin/master_ip_failover #!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port, ); my $vip = '192.168.200.100'; my $key = "1"; my $ssh_start_vip = "/sbin/ifconfig eno16777728:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig eno16777728:$key down"; $ssh_user = "root";
3、检查manager状态
[root@server05 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING). //未开启
4、开启manager监控
nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dea d_master_conf --ignore_last_failover< /dev/null >/var/log/masterha/app1/manager.log 2>&1 & 再次检查 [root@server05 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:6588) is running(0:PING_OK), master:server01可以看见已经在监控了
5、查看VIP server01(192.168.200.111)
在查看VIP时,第一台主机上没有查看到VIP,可能是因为ifconfig命令没有,安装工具软件包net-tools解决问题
[root@server01 ~]# yum -y install net-tools[root@server01 ~]# ip a | grep eno16777728 2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 192.168.200.111/24 brd 192.168.200.255 scope global eno16777728 inet 192.168.200.100/24 brd 192.168.200.255 scope global secondary eno16777728:1
十一、primary master(192.168.200.111)模拟主库故障
server01(192.168.200.111)
[root@server01 ~]# systemctl stop mariadb [root@server01 ~]# netstat -lnpt | grep :3306 [root@server01 ~]# ip a | grep eno16777728 2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 inet 192.168.200.111/24 brd 192.168.200.255 scope global eno16777728
server03(192.168.200.113)状态
MariaDB [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.200.112 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000001 Read_Master_Log_Pos: 1372 Relay_Log_File: slave-relay-bin.000002 Relay_Log_Pos: 530 Relay_Master_Log_File: master-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes
server04(192.168.200.114)状态
MariaDB [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.200.112 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000001 Read_Master_Log_Pos: 1372 Relay_Log_File: slave-relay-bin.000002 Relay_Log_Pos: 530 Relay_Master_Log_File: master-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes
Server05(192.168.200.115) 查看监控配置文件已经发生了变化(server01的配置已被删除),监控已自动关闭。
[root@server05 ~]# cat /etc/masterha/app1.cnf [server default] manager_log=/var/log/masterha/app1/manager.log manager_workdir=/var/log/masterha/app1 master_binlog_dir=/var/lib/mysql master_ip_failover_script=/usr/local/bin/master_ip_failover password=123456 ping_interval=1 remote_workdir=/tmp repl_password=123456 repl_user=repl user=root [server2] candidate_master=1 check_repl_delay=0 hostname=server02 port=3306 [server3] hostname=server03 port=3306 [server4] hostname=server04 port=3306#监控已关闭 [1]+ 完成 nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dea d_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1
Server05(192.168.200.115) 故障切换过程中的日志文件内容如下:
Selected server02 as a new master. server02: OK: Applying all logs succeeded. server02: OK: Activated master IP address. server04: This host has the latest relay log events. server03: This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. server04: OK: Applying all logs succeeded. Slave started, replicating from server02. server03: OK: Applying all logs succeeded. Slave started, replicating from server02. server02: Resetting slave info succeeded. Master failover to server02(192.168.200.112:3306) completed successfully.
十二、故障主库修复及VIP切回测试
1、修复server01,并提升server02为新的主库获取VIP
server01(192.168.200.111)
[root@server01 ~]# systemctl start mariadb [root@server01 ~]# netstat -lnpt | grep :3306 tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 6131/mysqld
[root@server01 ~]# mysql -u root -p123456 stop slave; CHANGE MASTER TO MASTER_HOST='192.168.200.112', MASTER_USER='repl', MASTER_PASSWORD='123456'; start slave; show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.200.112 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000001 Read_Master_Log_Pos: 1372 Relay_Log_File: mariadb-relay-bin.000002 Relay_Log_Pos: 1208 Relay_Master_Log_File: master-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes
server05(192.168.200.115)修改监控配置文件添加server01
[root@server05 ~]# vim /etc/masterha/app1.cnf [server01] hostname=server01 port=3306
server05(192.168.200.115)检查集群状态
[root@server05 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf -----------------------------------忽略部分信息----------------------------------- Thu Aug 31 22:20:30 2017 - [info] Alive Servers: Thu Aug 31 22:20:30 2017 - [info] server01(192.168.200.111:3306) Thu Aug 31 22:20:30 2017 - [info] server02(192.168.200.112:3306) Thu Aug 31 22:20:30 2017 - [info] server03(192.168.200.113:3306) Thu Aug 31 22:20:30 2017 - [info] server04(192.168.200.114:3306) -----------------------------------忽略部分信息----------------------------------- server02 (current master) +--server01 +--server03 +--server04 -----------------------------------忽略部分信息----------------------------------- MySQL Replication Health is OK.
server05(192.168.200.115)开启监控
[root@server05 ~]# nohup masterha_manager --conf=/etc/masterha/ap.cnf --remove_dead_master_conf --ignore_last_failover< /dev/null >/var/log/masterha/app1/manager.log 2>&1 & [1] 17031
server02(192.168.200.112)获得VIP
[root@server02 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:ec:88:30 brd ff:ff:ff:ff:ff:ff inet 192.168.200.112/24 brd 192.168.200.255 scope global eno16777728 valid_lft forever preferred_lft forever inet 192.168.200.100/24 brd 192.168.200.255 scope global secondary eno16777728:1 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:feec:8830/64 scope link valid_lft forever preferred_lft forever
2、模拟server02故障,VIP切回server01
server02(192.168.200.112)
[root@server02 ~]# systemctl stop mariadb [root@server02 ~]# netstat -lnpt | grep :3306
server01(192.168.200.111)
[root@server01 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:9b:5c:f0 brd ff:ff:ff:ff:ff:ff inet 192.168.200.111/24 brd 192.168.200.255 scope global eno16777728 valid_lft forever preferred_lft forever inet 192.168.200.100/24 brd 192.168.200.255 scope global secondary eno16777728:1 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fe9b:5cf0/64 scope link valid_lft forever preferred_lft forever
server03(192.168.200.113)状态:
MariaDB [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.200.111 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000002 Read_Master_Log_Pos: 923 Relay_Log_File: slave-relay-bin.000002 Relay_Log_Pos: 530 Relay_Master_Log_File: master-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes
server04(192.168.200.114)状态:
MariaDB [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.200.111 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000002 Read_Master_Log_Pos: 923 Relay_Log_File: slave-relay-bin.000002 Relay_Log_Pos: 530 Relay_Master_Log_File: master-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes
Server05(192.168.200.115) 配置文件变化(已经移除故障机server2配置):
[root@server05 ~]# cat /etc/masterha/app1.cnf [server default] manager_log=/var/log/masterha/app1/manager.log manager_workdir=/var/log/masterha/app1 master_binlog_dir=/var/lib/mysql master_ip_failover_script=/usr/local/bin/master_ip_failover password=123456 ping_interval=1 remote_workdir=/tmp repl_password=123456 repl_user=repl user=root [server1] hostname=server01 port=3306 [server3] hostname=server03 port=3306 [server4] hostname=server04 port=3306
Server05(192.168.200.115) 监控日志:
-----------------------------------忽略部分信息----------------------------------- Selected server01 as a new master. server01: OK: Applying all logs succeeded. server01: OK: Activated master IP address. server03: This host has the latest relay log events. server04: This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. server04: OK: Applying all logs succeeded. Slave started, replicating from server01. server03: OK: Applying all logs succeeded. Slave started, replicating from server01. server01: Resetting slave info succeeded. Master failover to server01(192.168.200.111:3306) completed successfully.
修复server02(192.168.200.112)主机,指向新的主库:
[root@server02 ~]# systemctl start mariadb [root@server02 ~]# netstat -lnpt | grep :3306 tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 5982/mysqld
[root@server02 ~]# mysql -u root -p123456 stop slave; CHANGE MASTER TO MASTER_HOST='192.168.200.111', MASTER_USER='repl', MASTER_PASSWORD='123456'; start slave; show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.200.111 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000002 Read_Master_Log_Pos: 923 Relay_Log_File: mariadb-relay-bin.000003 Relay_Log_Pos: 530 Relay_Master_Log_File: master-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes
Server05(192.168.200.115)修改监控配置文件添加server2配置:
[root@server05 ~]# vim /etc/masterha/app1.cnf [server2] hostname=server02 candidate_master=1 port=3306 check_repl_delay=0
Server05(192.168.200.115)检查集群状态:
[root@server05 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf server01 (current master) +--server02 +--server03 +--server04 -----------------------------------忽略部分信息----------------------------------- MySQL Replication Health is OK.
server02修复好了,并且server01成为了新的zhu
ssh server01
ssh server02
ssh server03
ssh server04
check_repl_delay=0