MySQL 高可用之 MHA

无人久伴 提交于 2019-11-26 17:16:45

 

安装环境说明

 

MySQL安装(略)

 

操作系统: CentOS7(64位都可以)(略)

 

MHA安装:

  下载:

  mha4mysql-manager:

wget  https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

  mha4mysql-node:

wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58-0.el7.centos.noarch.rpm

  安装:

    分别再mysql节点和manager节点安装

yum localinstall -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm
yum localinstall -y mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

  实质上:node节点(MySQL节点)只需要安装mha4mysql-node,但需要注意:需要先安装node包,然后再安装manager包,但为了管理上房本,建议在所有的节点上执行以上的安装

 

MySQL安装说明

  安装MySQL,搭建主从结构

  创建复制账号:

create user repl@'%' identified by 'repl';
grant replication slave on *.* to repl@'%';

  Linux ssh 信任

ssh-keygen -t rsa
cat  ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 644  ~/.ssh/authorized_keys  

  

 

 MHA配置说明

MHA的主要配置存放在Manager节点上,node节点需要存少量的脚本

manager节点上,建议存放

/etc/masterha/masterha_default.cnf  (针对全局生效)

/etc/masterha/app1.cnf   (具体集群的配置,每一个集群一个独立的文件,为后续启动多个manager做准备)

[mysql@cbssql01 masterha]$ cat masterha_default.conf 
[server default]
#MySQL的用户和密码
user=root
password=root
port=3307

#系统ssh用户
ssh_user=root

#复制用户
repl_user=repl
repl_password=repl


#监控
ping_interval=2
#shutdown_script=""

#切换调用的脚本
master_ip_failover_script= /etc/masterha/master_ip_failover
master_ip_online_change_script= /etc/masterha/master_ip_online_change
masterha_default.conf

其中master_ip_failover 和 master_ip_online_change 这两个脚本需要定制

/etc/masterha/app1.cnf

[mysql@cbssql01 masterha]$ cat app1.conf 
[server default]


#mha manager工作目录
manager_workdir = /var/log/masterha/app1
manager_log = /var/log/masterha/app1/app1.log
remote_workdir = /var/log/masterha/app1

[server1]
hostname=mha01
master_binlog_dir = /data/mysql/mysql3307/logs
candidate_master = 1      # 可以切换成主库
check_repl_delay = 0     #用防止master故障时,切换时slave有延迟,卡在那里切不过来。

[server2]
hostname=mha02
master_binlog_dir=/data/mysql/mysql3307/logs
candidate_master=1
check_repl_delay=0

[server2]
hostname=mha03
master_binlog_dir=/data/mysql/mysql3307/logs
candidate_master=1
check_repl_delay=0
app1.conf

 

 更改配置文件vip配置:

/etc/masterha 下文件,将vip的值修改成vip

[mysql@mha01 masterha]$ grep 192.168.1.200 *
drop_vip.sh:vip="192.168.1.200/32"
init_vip.sh:vip="192.168.1.200/32"
master_ip_failover:my $vip = "192.168.1.200";
master_ip_online_change:my $vip = "192.168.1.200";

 

 

 drop_vip.sh 

vip="192.168.1.200/32"
/sbin/ip addr del $vip dev team0
drop_vip.sh

init_vip.sh

vip="192.168.1.200/32"
/sbin/ip addr add $vip dev team0
init.vip.sh

master_ip_failover

[mysql@cbssql01 masterha]$ cat /etc/masterha/master_ip_failover 
#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
#自定义该组机器的vip
my $vip = "192.168.1.200";
my $if = "team0";
my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);

GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

sub add_vip {
    my $output1 = `ssh -o ConnectTimeout=15  -o ConnectionAttempts=3 $orig_master_host /sbin/ip addr del $vip/32 dev $if`;
    my $output2 = `ssh -o ConnectTimeout=15  -o ConnectionAttempts=3 $new_master_host /sbin/ip addr add $vip/32 dev $if`;

}
exit &main();

sub main {
  if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    eval {

      # updating global catalog, etc
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {

    # all arguments are passed.
    # If you manage master ip address at global catalog database,
    # activate new_master_ip here.
    # You can also grant write access (create user, set read_only=0, etc) here.
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      $new_master_handler->disable_log_bin_local();
      print "Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

      ## Creating an app user on the new master
      #print "Creating app user on the new master..\n";
      #FIXME_xxx_create_user( $new_master_handler->{dbh} );
      $new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();

      ## Update master ip on the catalog database, etc
      &add_vip();
      $exit_code = 0;
    };
    if ($@) {
      warn $@;

      # If you want to continue failover, exit 10.
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
master_ip_failover

master_ip_online_change

[mysql@cbssql01 masterha]$ cat /etc/masterha/master_ip_online_change 
#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $_running_interval = 0.1;
#添加vip定义
my $vip = "192.168.1.200";
my $if = "team0";

my (
  $command,              $orig_master_is_new_slave, $orig_master_host,
  $orig_master_ip,       $orig_master_port,         $orig_master_user,
  $orig_master_password, $orig_master_ssh_user,     $new_master_host,
  $new_master_ip,        $new_master_port,          $new_master_user,
  $new_master_password,  $new_master_ssh_user,
);
GetOptions(
  'command=s'                => \$command,
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,
  'orig_master_host=s'       => \$orig_master_host,
  'orig_master_ip=s'         => \$orig_master_ip,
  'orig_master_port=i'       => \$orig_master_port,
  'orig_master_user=s'       => \$orig_master_user,
  'orig_master_password=s'   => \$orig_master_password,
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,
  'new_master_host=s'        => \$new_master_host,
  'new_master_ip=s'          => \$new_master_ip,
  'new_master_port=i'        => \$new_master_port,
  'new_master_user=s'        => \$new_master_user,
  'new_master_password=s'    => \$new_master_password,
  'new_master_ssh_user=s'    => \$new_master_ssh_user,
);

exit &main();
sub drop_vip {
        my $output = `ssh -o ConnectTimeout=15  -o ConnectionAttempts=3 $orig_master_host /sbin/ip addr del $vip/32 dev $if`;
    #mysql里的连接全部干掉
    #FIXME
}
sub add_vip {
        my $output = `ssh -o ConnectTimeout=15  -o ConnectionAttempts=3 $new_master_host /sbin/ip addr add $vip/32 dev $if`;

}


sub current_time_us {
  my ( $sec, $microsec ) = gettimeofday();
  my $curdate = localtime($sec);
  return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
  my $elapsed = tv_interval($_tstart);
  if ( $_running_interval > $elapsed ) {
    sleep( $_running_interval - $elapsed );
  }
}

sub get_threads_util {
  my $dbh                    = shift;
  my $my_connection_id       = shift;
  my $running_time_threshold = shift;
  my $type                   = shift;
  $running_time_threshold = 0 unless ($running_time_threshold);
  $type                   = 0 unless ($type);
  my @threads;

  my $sth = $dbh->prepare("SHOW PROCESSLIST");
  $sth->execute();

  while ( my $ref = $sth->fetchrow_hashref() ) {
    my $id         = $ref->{Id};
    my $user       = $ref->{User};
    my $host       = $ref->{Host};
    my $command    = $ref->{Command};
    my $state      = $ref->{State};
    my $query_time = $ref->{Time};
    my $info       = $ref->{Info};
    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
    next if ( $my_connection_id == $id );
    next if ( defined($query_time) && $query_time < $running_time_threshold );
    next if ( defined($command)    && $command eq "Binlog Dump" );
    next if ( defined($user)       && $user eq "system user" );
    next
      if ( defined($command)
      && $command eq "Sleep"
      && defined($query_time)
      && $query_time >= 1 );

    if ( $type >= 1 ) {
      next if ( defined($command) && $command eq "Sleep" );
      next if ( defined($command) && $command eq "Connect" );
    }

    if ( $type >= 2 ) {
      next if ( defined($info) && $info =~ m/^select/i );
      next if ( defined($info) && $info =~ m/^show/i );
    }

    push @threads, $ref;
  }
  return @threads;
}

sub main {
  if ( $command eq "stop" ) {
    ## Gracefully killing connections on the current master
    # 1. Set read_only= 1 on the new master
    # 2. DROP USER so that no app user can establish new connections
    # 3. Set read_only= 1 on the current master
    # 4. Kill current queries
    # * Any database access failure will result in script die.
    my $exit_code = 1;
    eval {
      ## Setting read_only=1 on the new master (to avoid accident)
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error(die_on_error)_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );
      print current_time_us() . " Set read_only on the new master.. ";
      $new_master_handler->enable_read_only();
      if ( $new_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }
      $new_master_handler->disconnect();

      # Connecting to the orig master, die if any database error happens
      my $orig_master_handler = new MHA::DBHelper();
      $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
        $orig_master_user, $orig_master_password, 1 );

      ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
      $orig_master_handler->disable_log_bin_local();
     # print current_time_us() . " Drpping app user on the orig master..\n";
      print current_time_us() . " drop vip $vip..\n";
      #drop_app_user($orig_master_handler);
     &drop_vip();

      ## Waiting for N * 100 milliseconds so that current connections can exit
      my $time_until_read_only = 15;
      $_tstart = [gettimeofday];
      my @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_read_only > 0 && $#threads >= 0 ) {
        if ( $time_until_read_only % 5 == 0 ) {
          printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_read_only * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_read_only--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
      print current_time_us() . " Set read_only=1 on the orig master.. ";
      $orig_master_handler->enable_read_only();
      if ( $orig_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }

      ## Waiting for M * 100 milliseconds so that current update queries can complete
      my $time_until_kill_threads = 5;
      @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
        if ( $time_until_kill_threads % 5 == 0 ) {
          printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_kill_threads--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## Terminating all threads
      print current_time_us() . " Killing all application threads..\n";
      $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
      print current_time_us() . " done.\n";
      $orig_master_handler->enable_log_bin_local();
      $orig_master_handler->disconnect();

      ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {
    ## Activating master ip on the new master
    # 1. Create app user with write privileges
    # 2. Moving backup script if needed
    # 3. Register new master's ip to the catalog database

# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      $new_master_handler->disable_log_bin_local();
      print current_time_us() . " Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

      ## Creating an app user on the new master
      #print current_time_us() . " Creating app user on the new master..\n";
      print current_time_us() . "Add vip $vip on $if..\n";
     # create_app_user($new_master_handler);
      &add_vip();
      $new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();

      ## Update master ip on the catalog database, etc
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub usage {
  print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
  die;
}
master_ip_online_change

 其中drop_vip.sh, init_vip.sh , master_ip_failover, master_ip_online_change 都需要给执行权限。

 

 MHA环境检查确认

在manager节点,确认ssh是不是OK

masterha_check_ssh --global_conf=/etc/masterha/masterha_default.conf  --conf=/etc/masterha/app1.conf
最后看到提示: [info] All SSH connection tests passwd successfully

确认复制是不是正常

masterha_check_repl --global_conf=/etc/masterha/masterha_default.conf  --conf=/etc/masterha/app1.conf

最后要看到输出: MySQL Replication Health is OK. 同时可以看到复制的架构输出

到这,MHA的主体框架已经完事。到这个地方,其实就差不多可以启动了。

 

MHA启动

nohup masterha_manager --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf  &
MHA启动
 
failover后下次重启
每次failover切换后在管理目录生成文件app1.failover.compete, 下次在切换时侯会发现有这个文件导致切换不成功,需要手动清理掉。
rm -rf /masterha/app1/app1.failover.complete
也可以加上参数 --ignore_last_failover
 
第一次起动,主库上的 VIP 不会自动绑定,需要手 功调用 init_vip.sh 去绑定,主库发生故障切换会进行 vip 的漂移。
# masterha_check_status --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf
检查是否启动:
#masterha_check_status --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf 
停止 mha
#masterha_stop --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf 
Stopped app1 successfully. [1]+ Exit 1 nohup masterha_manager --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf > /tmp/mha_manager.log 2>&1

 

MHA 日常维护命令集

 
1.查看 ssh 登陆是否成功 masterha_check_ssh --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf 
 
2.查看复制是否建立好 masterha_check_repl --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf 
 
3.启动 mha nohup masterha_manager --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf > /tmp/mha_manager.log < /dev/null 2>&1 & 
当有 slave 节点宕掉的情况是启动不了的,加上--ignore_fail_on_start 
即使有节点宕掉也能启 动 mha nohup masterha_manager --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --ignore_fail_on_start > /tmp/mha_manager.log < /dev/null 2>&1 & 
需要在配置文件中设置 ignore_fail=1 
 
4.检查启动的状态 masterha_check_status--global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf 
 
5.停止 mha masterha_stop --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf
 
6.failover 后下次重启 
每次 failover 切换后会在管理目录生成文件 app1.failover.complete ,下次在切换的时候会发 现有这个文件导致切换不成功,需要手动清理掉。 
rm -rf /masterha/app1/app1.failover.complete 
也可以加上参数--ignore_last_failover 
 
7.手工 failover 
手工 failover 场景,master 死掉,但是 masterha_manager 没有开启,可以通过手工 failover: masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=old_ip --master_state=dead --new_master_host=new_ip --ignore_last_failover
 
8.masterha_manager 是一种监视和故障转移的程序。另一方面,masterha_master_switch 程序 不监控主库。 masterha_master_switch 可以用于主库故障转移,也可用于在线总开关。 
 
9.手动在线切换
 masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --master_state=alive --new_master_host=192.168.199.78 --orig_master_is_new_slave
 或者
 masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --master_state=alive --new_master_host=192.168.199.78 -orig_master_is_new_slave --running_updates_limit=10000 --orig_master_is_new_slave 
切换时加上此参数是将原 master 变为 slave 节点,如果不加此参 数,原来的 master 将不启动 --running_updates_limit=10000 切换时候选 master 如果有延迟的话,mha 切换不能成功,加 上此参数表示延迟在此时间范围内都可切换(单位为 s),但是切换的时间长短是由 recover 时 relay 日志的大小决定
 
手动在线切换 mha,切换时需要将在运行的 mha 停掉后才能切换。
在备库先执行 DDL,一般先 stop slave,一般不记录 mysql 日志,可以通过 set SQL_LOG_BIN = 0 实现。然后进行一次主备切换操作,再在原来的主库上执行 DDL。这种方法适用于增减索 引,如果是增加字段就需要额外注意。
Online master switch 开始只有当所有下列条件得到满足。
1. IO threads on all slaves are running // 在所有 slave 上 IO 线程运行。
2. SQL threads on all slaves are running //SQL 线程在所有的 slave 上正常运行。
3. Seconds_Behind_Master on all slaves are less or equal than --running_updates_limit seconds // 在所有的 slaves 上 Seconds_Behind_Master 要小于等于 running_updates_limit seconds
4. On master, none of update queries take more than --running_updates_limit seconds in the show processlist output // 在主上,没有更新查询操作多于 running_updates_limit seconds 在 show processlist 输出结果上。
    可以通过如下命令停止 mha masterha_stop --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf
 
 
 

MHA安装包内容

  MHA有俩个安装包,一个mha4mysql-manager, 一个mha4mysql-node,如果是管理节点这两个包都需要安装,如果是数据节点只用安装mha4mysql-node包即可。
 
 
[mysql@cbssql01 masterha]$ rpm -ql mha4mysql-manager-0.56-0.el6.noarch
/usr/bin/masterha_check_repl    做检查
/usr/bin/masterha_check_ssh    做检查
/usr/bin/masterha_check_status    做检查
/usr/bin/masterha_conf_host
/usr/bin/masterha_manager
/usr/bin/masterha_master_monitor    监控
/usr/bin/masterha_master_switch     切换
/usr/bin/masterha_secondary_check    远程检查
/usr/bin/masterha_stop
/usr/share/man/man1/masterha_check_repl.1.gz
/usr/share/man/man1/masterha_check_ssh.1.gz
/usr/share/man/man1/masterha_check_status.1.gz
/usr/share/man/man1/masterha_conf_host.1.gz
/usr/share/man/man1/masterha_manager.1.gz
/usr/share/man/man1/masterha_master_monitor.1.gz
/usr/share/man/man1/masterha_master_switch.1.gz
/usr/share/man/man1/masterha_secondary_check.1.gz
/usr/share/man/man1/masterha_stop.1.gz
/usr/share/perl5/vendor_perl/MHA/Config.pm
/usr/share/perl5/vendor_perl/MHA/DBHelper.pm
/usr/share/perl5/vendor_perl/MHA/FileStatus.pm
/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm
/usr/share/perl5/vendor_perl/MHA/ManagerAdmin.pm
/usr/share/perl5/vendor_perl/MHA/ManagerAdminWrapper.pm
/usr/share/perl5/vendor_perl/MHA/ManagerConst.pm
/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm
/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm
/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm
/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm
/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm
/usr/share/perl5/vendor_perl/MHA/Server.pm
/usr/share/perl5/vendor_perl/MHA/ServerManager.pm

 

 命令大概的介绍

masterha_manager - Monitoring MySQL master server availability and do failover if it detects master failure
 
masterha_check_repl
masterha_check_ssh
masterha_check_status
masterha_conf_host
masterha_manager
masterha_master_monitor
masterha_master_switch
masterha_secondary_check  -Checking master avalibility from additional network routes 
masterha_stop

利用 perldoc 命令可以阅读到相应的使用及帮助

 

Master 故障切换

masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --master_state=dead --new_master_host=192.168.199.78 --orig_master_is_new_slave

Master 手动在线切换

masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --master_state=alive --new_master_host=192.168.199.78 --orig_master_is_new_slave

mha4mysql-node

[mysql@cbssql01 bin]$  rpm -ql mha4mysql-node-0.56-0.el6.noarch
/usr/bin/apply_diff_relay_logs
/usr/bin/filter_mysqlbinlog
/usr/bin/purge_relay_logs
/usr/bin/save_binary_logs    重要
/usr/share/man/man1/apply_diff_relay_logs.1.gz
/usr/share/man/man1/filter_mysqlbinlog.1.gz
/usr/share/man/man1/purge_relay_logs.1.gz
/usr/share/man/man1/save_binary_logs.1.gz
/usr/share/perl5/vendor_perl/MHA/BinlogHeaderParser.pm
/usr/share/perl5/vendor_perl/MHA/BinlogManager.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFindManager.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFinder.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFinderElp.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFinderXid.pm
/usr/share/perl5/vendor_perl/MHA/NodeConst.pm
/usr/share/perl5/vendor_perl/MHA/NodeUtil.pm
/usr/share/perl5/vendor_perl/MHA/SlaveUtil.pm
save_binary_logs  - Concatenating binary or relay logs from the specified file/position to the end of the log. This command is automatically executed from MHA Manager on failover, and manual execution should not be needed normally.(重点命令)
 
 

MHA 切换不丢数据实现

在线切换不丢数据实现

在线切换通常使用在我们对 Master 机器要进行维护, Master 机器在计划内下线,在线切换会调用 master_ip_online_change_script 定义的脚本,这个脚本可以定义两件事。
  1. 把服务账号的密码改了,vip干掉。    (master drop vip)
  2. 把所有连接进来的连接全干掉    (master kill app connection)
  3. 然后把原来的 Master 设成 read_only =1 & super_read_only = 1  (服务账号如果有 super 权限,这个就无语了) 确认所有的 slave 都正常的工作,而且都已经追上 Master ,如果存在延迟是不允许切换的,需要处理好回退(或是注意一下,先别做改密码的操作),这个延迟是可以通过 --running_updates_limit 定义。      (slave 和 master 同步正常)
  4. 根据参数定义,决定是否把原来的master 做一个新的 slave 连接到新的 master 上
  5. 把其他 slave 都连接到新的 master 上 
  6. 新的 master 是否有服务账号 (修复)
  7. 新库上是不是接管 vip 或是把 IP 注册到 ZK 中 
 
 

故障切换不丢数据实现

故障切换是原来的 Master 已经不能被访问,我们期望是能自动让一个从库接管服务。

  这个动作是由 mastermha_manager 发起,大概过程:

   1. masermha_manager 启动,确认是没做过 failover, 且存在备选的 Master

   2. 根据定义的检测不断的查测 master 是不是挂了,如果挂了,则开始进行切换流程

   3. 查看所有的从库上执行 show slave status,得到每个slave同步的位置。

    a) 在原 Master 主机可以连上的情况下,在主机上调用 save_binary_logs 根据不同slave上同步的位置保存binlog到一个文件里,传送到不同的 Salve 上并应用。

    b) 如果原 master 连接不上去的情况,在0.56这个版本里面增加了,可以在binlog server 上读取差异的binlog发送给其他slave上执行

      c) 如果也没 binlog server 在对比 slave上的同步后,对比所有 slave 看看同步是否存在差异,如果存在差异,则会把同步最多的那个slave 作为laster slave,从他的relay_log 中差异读出来,在其它的slave上执行。

   经过以上会把数据补全,所有的slave可以认为是一样的

  4. 根据配置选择出备选节点,把其他slave做一个change maste到新的备选节点上

    a) 如果有多个备选节点,可以根据权重选择

    b) 如果没定义权重,则按顺序选择第一个定义的的成为新的master

  5. 调用 master_ip_failover 定义的脚本把 VIP 切换过来

  原始的 master_ip_failover 里没实现 vip 绑定,需要自己实现

  

 

MHA 使用注意事项

MHA Master 故障的认定方式

默认是3秒,mysqlladmin ping 一次 mysql 并做一个尝试连接 master

问题:

如何保证master的数据完全传输到slave?

1. 如果master服务器可以登录,ssh到master服务器

mysqlbinlog --start-position = read_master_log_pos  master_log_file | new_master

2. 如果master服务器不能登录:

  a) 如果有 binlog_server:

    mysqlbinlog --start-position = read_master_log_pos  master_log_file | new_master

  b) 如果没有binlog_server

    前提: relay_log_purge = 0

    通过: apply_diff_realy_logs 脚本进行补数据,找到最大的file和pos设置成master,对比binlog/pos补小于最大的slave上。

 

 

MHA故障切换逻辑

  /etc/masterha/master_ip_failover --orig_master_host=xxx.xxx.xxx.xxx --orig_master_ip=xxx.xxx.xxx.xxx --orig_master_port=xxxx --commond=stopssh --ssh_user=root

  /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=xxx.xxx.xxx.xxx --orig_master_ip=xxx.xxx.xxx.xxx --orig_master_port=xxxx --new_master_host=xxx.xxx.xxx.xxx --new_master_ip=xxx.xxx.xxx.xxx --new_master_port=xxxx --new_master_user='xxx' --new_master_password='xxx'

 

 

 

 
 
 

 

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!