安装环境说明
MySQL安装(略)
操作系统: CentOS7(64位都可以)(略)
MHA安装:
下载:
mha4mysql-manager:
wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
mha4mysql-node:
wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58-0.el7.centos.noarch.rpm
安装:
分别再mysql节点和manager节点安装
yum localinstall -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm yum localinstall -y mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
实质上:node节点(MySQL节点)只需要安装mha4mysql-node,但需要注意:需要先安装node包,然后再安装manager包,但为了管理上房本,建议在所有的节点上执行以上的安装
MySQL安装说明
安装MySQL,搭建主从结构
创建复制账号:
create user repl@'%' identified by 'repl'; grant replication slave on *.* to repl@'%';
Linux ssh 信任
ssh-keygen -t rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 644 ~/.ssh/authorized_keys
MHA配置说明
MHA的主要配置存放在Manager节点上,node节点需要存少量的脚本
manager节点上,建议存放
/etc/masterha/masterha_default.cnf (针对全局生效)
/etc/masterha/app1.cnf (具体集群的配置,每一个集群一个独立的文件,为后续启动多个manager做准备)
[mysql@cbssql01 masterha]$ cat masterha_default.conf [server default] #MySQL的用户和密码 user=root password=root port=3307 #系统ssh用户 ssh_user=root #复制用户 repl_user=repl repl_password=repl #监控 ping_interval=2 #shutdown_script="" #切换调用的脚本 master_ip_failover_script= /etc/masterha/master_ip_failover master_ip_online_change_script= /etc/masterha/master_ip_online_change
其中master_ip_failover 和 master_ip_online_change 这两个脚本需要定制
/etc/masterha/app1.cnf
[mysql@cbssql01 masterha]$ cat app1.conf [server default] #mha manager工作目录 manager_workdir = /var/log/masterha/app1 manager_log = /var/log/masterha/app1/app1.log remote_workdir = /var/log/masterha/app1 [server1] hostname=mha01 master_binlog_dir = /data/mysql/mysql3307/logs candidate_master = 1 # 可以切换成主库 check_repl_delay = 0 #用防止master故障时,切换时slave有延迟,卡在那里切不过来。 [server2] hostname=mha02 master_binlog_dir=/data/mysql/mysql3307/logs candidate_master=1 check_repl_delay=0 [server2] hostname=mha03 master_binlog_dir=/data/mysql/mysql3307/logs candidate_master=1 check_repl_delay=0
更改配置文件vip配置:
/etc/masterha 下文件,将vip的值修改成vip
[mysql@mha01 masterha]$ grep 192.168.1.200 * drop_vip.sh:vip="192.168.1.200/32" init_vip.sh:vip="192.168.1.200/32" master_ip_failover:my $vip = "192.168.1.200"; master_ip_online_change:my $vip = "192.168.1.200";
drop_vip.sh
vip="192.168.1.200/32" /sbin/ip addr del $vip dev team0
init_vip.sh
vip="192.168.1.200/32" /sbin/ip addr add $vip dev team0
master_ip_failover
[mysql@cbssql01 masterha]$ cat /etc/masterha/master_ip_failover #!/usr/bin/env perl # Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This is a sample script and is not complete. Modify the script based on your environment. use strict; use warnings FATAL => 'all'; use Getopt::Long; use MHA::DBHelper; #自定义该组机器的vip my $vip = "192.168.1.200"; my $if = "team0"; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password ); GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, ); sub add_vip { my $output1 = `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $orig_master_host /sbin/ip addr del $vip/32 dev $if`; my $output2 = `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $new_master_host /sbin/ip addr add $vip/32 dev $if`; } exit &main(); sub main { if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; eval { # updating global catalog, etc $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); ## Set read_only=0 on the new master $new_master_handler->disable_log_bin_local(); print "Set read_only=0 on the new master.\n"; $new_master_handler->disable_read_only(); ## Creating an app user on the new master #print "Creating app user on the new master..\n"; #FIXME_xxx_create_user( $new_master_handler->{dbh} ); $new_master_handler->enable_log_bin_local(); $new_master_handler->disconnect(); ## Update master ip on the catalog database, etc &add_vip(); $exit_code = 0; }; if ($@) { warn $@; # If you want to continue failover, exit 10. exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { # do nothing exit 0; } else { &usage(); exit 1; } } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }
master_ip_online_change
[mysql@cbssql01 masterha]$ cat /etc/masterha/master_ip_online_change #!/usr/bin/env perl # Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This is a sample script and is not complete. Modify the script based on your environment. use strict; use warnings FATAL => 'all'; use Getopt::Long; use MHA::DBHelper; use MHA::NodeUtil; use Time::HiRes qw( sleep gettimeofday tv_interval ); use Data::Dumper; my $_tstart; my $_running_interval = 0.1; #添加vip定义 my $vip = "192.168.1.200"; my $if = "team0"; my ( $command, $orig_master_is_new_slave, $orig_master_host, $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, $orig_master_ssh_user, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password, $new_master_ssh_user, ); GetOptions( 'command=s' => \$command, 'orig_master_is_new_slave' => \$orig_master_is_new_slave, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'orig_master_user=s' => \$orig_master_user, 'orig_master_password=s' => \$orig_master_password, 'orig_master_ssh_user=s' => \$orig_master_ssh_user, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, 'new_master_ssh_user=s' => \$new_master_ssh_user, ); exit &main(); sub drop_vip { my $output = `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $orig_master_host /sbin/ip addr del $vip/32 dev $if`; #mysql里的连接全部干掉 #FIXME } sub add_vip { my $output = `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $new_master_host /sbin/ip addr add $vip/32 dev $if`; } sub current_time_us { my ( $sec, $microsec ) = gettimeofday(); my $curdate = localtime($sec); return $curdate . " " . sprintf( "%06d", $microsec ); } sub sleep_until { my $elapsed = tv_interval($_tstart); if ( $_running_interval > $elapsed ) { sleep( $_running_interval - $elapsed ); } } sub get_threads_util { my $dbh = shift; my $my_connection_id = shift; my $running_time_threshold = shift; my $type = shift; $running_time_threshold = 0 unless ($running_time_threshold); $type = 0 unless ($type); my @threads; my $sth = $dbh->prepare("SHOW PROCESSLIST"); $sth->execute(); while ( my $ref = $sth->fetchrow_hashref() ) { my $id = $ref->{Id}; my $user = $ref->{User}; my $host = $ref->{Host}; my $command = $ref->{Command}; my $state = $ref->{State}; my $query_time = $ref->{Time}; my $info = $ref->{Info}; $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info); next if ( $my_connection_id == $id ); next if ( defined($query_time) && $query_time < $running_time_threshold ); next if ( defined($command) && $command eq "Binlog Dump" ); next if ( defined($user) && $user eq "system user" ); next if ( defined($command) && $command eq "Sleep" && defined($query_time) && $query_time >= 1 ); if ( $type >= 1 ) { next if ( defined($command) && $command eq "Sleep" ); next if ( defined($command) && $command eq "Connect" ); } if ( $type >= 2 ) { next if ( defined($info) && $info =~ m/^select/i ); next if ( defined($info) && $info =~ m/^show/i ); } push @threads, $ref; } return @threads; } sub main { if ( $command eq "stop" ) { ## Gracefully killing connections on the current master # 1. Set read_only= 1 on the new master # 2. DROP USER so that no app user can establish new connections # 3. Set read_only= 1 on the current master # 4. Kill current queries # * Any database access failure will result in script die. my $exit_code = 1; eval { ## Setting read_only=1 on the new master (to avoid accident) my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error(die_on_error)_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); print current_time_us() . " Set read_only on the new master.. "; $new_master_handler->enable_read_only(); if ( $new_master_handler->is_read_only() ) { print "ok.\n"; } else { die "Failed!\n"; } $new_master_handler->disconnect(); # Connecting to the orig master, die if any database error happens my $orig_master_handler = new MHA::DBHelper(); $orig_master_handler->connect( $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, 1 ); ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand $orig_master_handler->disable_log_bin_local(); # print current_time_us() . " Drpping app user on the orig master..\n"; print current_time_us() . " drop vip $vip..\n"; #drop_app_user($orig_master_handler); &drop_vip(); ## Waiting for N * 100 milliseconds so that current connections can exit my $time_until_read_only = 15; $_tstart = [gettimeofday]; my @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_read_only > 0 && $#threads >= 0 ) { if ( $time_until_read_only % 5 == 0 ) { printf "%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_read_only * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_read_only--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); } ## Setting read_only=1 on the current master so that nobody(except SUPER) can write print current_time_us() . " Set read_only=1 on the orig master.. "; $orig_master_handler->enable_read_only(); if ( $orig_master_handler->is_read_only() ) { print "ok.\n"; } else { die "Failed!\n"; } ## Waiting for M * 100 milliseconds so that current update queries can complete my $time_until_kill_threads = 5; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_kill_threads > 0 && $#threads >= 0 ) { if ( $time_until_kill_threads % 5 == 0 ) { printf "%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_kill_threads * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_kill_threads--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); } ## Terminating all threads print current_time_us() . " Killing all application threads..\n"; $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 ); print current_time_us() . " done.\n"; $orig_master_handler->enable_log_bin_local(); $orig_master_handler->disconnect(); ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { ## Activating master ip on the new master # 1. Create app user with write privileges # 2. Moving backup script if needed # 3. Register new master's ip to the catalog database # We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery. # If exit code is 0 or 10, MHA does not abort my $exit_code = 10; eval { my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); ## Set read_only=0 on the new master $new_master_handler->disable_log_bin_local(); print current_time_us() . " Set read_only=0 on the new master.\n"; $new_master_handler->disable_read_only(); ## Creating an app user on the new master #print current_time_us() . " Creating app user on the new master..\n"; print current_time_us() . "Add vip $vip on $if..\n"; # create_app_user($new_master_handler); &add_vip(); $new_master_handler->enable_log_bin_local(); $new_master_handler->disconnect(); ## Update master ip on the catalog database, etc $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { # do nothing exit 0; } else { &usage(); exit 1; } } sub usage { print "Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; die; }
其中drop_vip.sh, init_vip.sh , master_ip_failover, master_ip_online_change 都需要给执行权限。
MHA环境检查确认
在manager节点,确认ssh是不是OK
masterha_check_ssh --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf
确认复制是不是正常
masterha_check_repl --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf
最后要看到输出: MySQL Replication Health is OK. 同时可以看到复制的架构输出
到这,MHA的主体框架已经完事。到这个地方,其实就差不多可以启动了。
MHA启动
nohup masterha_manager --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf &
failover后下次重启 每次failover切换后在管理目录生成文件app1.failover.compete, 下次在切换时侯会发现有这个文件导致切换不成功,需要手动清理掉。 rm -rf /masterha/app1/app1.failover.complete 也可以加上参数 --ignore_last_failover
MHA 日常维护命令集
MHA安装包内容
[mysql@cbssql01 masterha]$ rpm -ql mha4mysql-manager-0.56-0.el6.noarch /usr/bin/masterha_check_repl 做检查 /usr/bin/masterha_check_ssh 做检查 /usr/bin/masterha_check_status 做检查 /usr/bin/masterha_conf_host /usr/bin/masterha_manager /usr/bin/masterha_master_monitor 监控 /usr/bin/masterha_master_switch 切换 /usr/bin/masterha_secondary_check 远程检查 /usr/bin/masterha_stop /usr/share/man/man1/masterha_check_repl.1.gz /usr/share/man/man1/masterha_check_ssh.1.gz /usr/share/man/man1/masterha_check_status.1.gz /usr/share/man/man1/masterha_conf_host.1.gz /usr/share/man/man1/masterha_manager.1.gz /usr/share/man/man1/masterha_master_monitor.1.gz /usr/share/man/man1/masterha_master_switch.1.gz /usr/share/man/man1/masterha_secondary_check.1.gz /usr/share/man/man1/masterha_stop.1.gz /usr/share/perl5/vendor_perl/MHA/Config.pm /usr/share/perl5/vendor_perl/MHA/DBHelper.pm /usr/share/perl5/vendor_perl/MHA/FileStatus.pm /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm /usr/share/perl5/vendor_perl/MHA/ManagerAdmin.pm /usr/share/perl5/vendor_perl/MHA/ManagerAdminWrapper.pm /usr/share/perl5/vendor_perl/MHA/ManagerConst.pm /usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm /usr/share/perl5/vendor_perl/MHA/MasterFailover.pm /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm /usr/share/perl5/vendor_perl/MHA/MasterRotate.pm /usr/share/perl5/vendor_perl/MHA/SSHCheck.pm /usr/share/perl5/vendor_perl/MHA/Server.pm /usr/share/perl5/vendor_perl/MHA/ServerManager.pm
命令大概的介绍
masterha_check_repl masterha_check_ssh masterha_check_status masterha_conf_host masterha_manager masterha_master_monitor masterha_master_switch masterha_secondary_check -Checking master avalibility from additional network routes masterha_stop
利用 perldoc 命令可以阅读到相应的使用及帮助
Master 故障切换
masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --master_state=dead --new_master_host=192.168.199.78 --orig_master_is_new_slave
Master 手动在线切换
masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --master_state=alive --new_master_host=192.168.199.78 --orig_master_is_new_slave
mha4mysql-node
[mysql@cbssql01 bin]$ rpm -ql mha4mysql-node-0.56-0.el6.noarch /usr/bin/apply_diff_relay_logs /usr/bin/filter_mysqlbinlog /usr/bin/purge_relay_logs /usr/bin/save_binary_logs 重要 /usr/share/man/man1/apply_diff_relay_logs.1.gz /usr/share/man/man1/filter_mysqlbinlog.1.gz /usr/share/man/man1/purge_relay_logs.1.gz /usr/share/man/man1/save_binary_logs.1.gz /usr/share/perl5/vendor_perl/MHA/BinlogHeaderParser.pm /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm /usr/share/perl5/vendor_perl/MHA/BinlogPosFindManager.pm /usr/share/perl5/vendor_perl/MHA/BinlogPosFinder.pm /usr/share/perl5/vendor_perl/MHA/BinlogPosFinderElp.pm /usr/share/perl5/vendor_perl/MHA/BinlogPosFinderXid.pm /usr/share/perl5/vendor_perl/MHA/NodeConst.pm /usr/share/perl5/vendor_perl/MHA/NodeUtil.pm /usr/share/perl5/vendor_perl/MHA/SlaveUtil.pm
save_binary_logs - Concatenating binary or relay logs from the specified file/position to the end of the log. This command is automatically executed from MHA Manager on failover, and manual execution should not be needed normally.(重点命令)
MHA 切换不丢数据实现
在线切换不丢数据实现
故障切换不丢数据实现
故障切换是原来的 Master 已经不能被访问,我们期望是能自动让一个从库接管服务。
这个动作是由 mastermha_manager 发起,大概过程:
1. masermha_manager 启动,确认是没做过 failover, 且存在备选的 Master
2. 根据定义的检测不断的查测 master 是不是挂了,如果挂了,则开始进行切换流程
3. 查看所有的从库上执行 show slave status,得到每个slave同步的位置。
a) 在原 Master 主机可以连上的情况下,在主机上调用 save_binary_logs 根据不同slave上同步的位置保存binlog到一个文件里,传送到不同的 Salve 上并应用。
b) 如果原 master 连接不上去的情况,在0.56这个版本里面增加了,可以在binlog server 上读取差异的binlog发送给其他slave上执行
c) 如果也没 binlog server 在对比 slave上的同步后,对比所有 slave 看看同步是否存在差异,如果存在差异,则会把同步最多的那个slave 作为laster slave,从他的relay_log 中差异读出来,在其它的slave上执行。
经过以上会把数据补全,所有的slave可以认为是一样的
4. 根据配置选择出备选节点,把其他slave做一个change maste到新的备选节点上
a) 如果有多个备选节点,可以根据权重选择
b) 如果没定义权重,则按顺序选择第一个定义的的成为新的master
5. 调用 master_ip_failover 定义的脚本把 VIP 切换过来
原始的 master_ip_failover 里没实现 vip 绑定,需要自己实现
MHA 使用注意事项
MHA Master 故障的认定方式
默认是3秒,mysqlladmin ping 一次 mysql 并做一个尝试连接 master
问题:
如何保证master的数据完全传输到slave?
1. 如果master服务器可以登录,ssh到master服务器
mysqlbinlog --start-position = read_master_log_pos master_log_file | new_master
2. 如果master服务器不能登录:
a) 如果有 binlog_server:
mysqlbinlog --start-position = read_master_log_pos master_log_file | new_master
b) 如果没有binlog_server
前提: relay_log_purge = 0
通过: apply_diff_realy_logs 脚本进行补数据,找到最大的file和pos设置成master,对比binlog/pos补小于最大的slave上。
MHA故障切换逻辑
/etc/masterha/master_ip_failover --orig_master_host=xxx.xxx.xxx.xxx --orig_master_ip=xxx.xxx.xxx.xxx --orig_master_port=xxxx --commond=stopssh --ssh_user=root
/etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=xxx.xxx.xxx.xxx --orig_master_ip=xxx.xxx.xxx.xxx --orig_master_port=xxxx --new_master_host=xxx.xxx.xxx.xxx --new_master_ip=xxx.xxx.xxx.xxx --new_master_port=xxxx --new_master_user='xxx' --new_master_password='xxx'