阿里云-云小站(无限量代金券发放中)
【腾讯云】云服务器、云数据库、COS、CDN、短信等热卖云产品特惠抢购

MySQL MHA高可用环境搭建

443次阅读
没有评论

共计 28672 个字符,预计需要花费 72 分钟才能阅读完成。

一、安装 MHA 基本环境
1. 安装 MHA node
(1) 基本环境说明,本文参考互联网文章学习,搭建 MHA 与测试如下。
参考文档:http://www.linuxidc.com/Linux/2016-05/130923.htm
角色                IP 地址            主机名   
=============================================   
Master              192.168.1.121    node1   
Slave              192.168.1.122    node2   
Slave              192.168.1.123    node3   
Monitor host        192.168.1.125    node5

(2) 在 node1,node2,node3,node5 操作:
# vi /etc/hosts
192.168.1.121  node1 
192.168.1.122  node2   
192.168.1.123  node3   
192.168.1.125  node5
安装 MHA node 节点软件包:
# rpm -ivh http://dl.Fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm   
# yum install perl-DBD-MySQL perl-CPAN -y   
# tar xf mha4mysql-node-0.56.tar.gz   
# cd mha4mysql-node-0.56   
# perl Makefile.PL       
# make && make install   
 
2. 安装 MHA Manager
在 node5 管理节点上操作:注:MHA Manager 主机也是需要安装 MHA Node,MHA Manger
# yum install perl-DBD-MySQL perl-CPAN perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y 
# tar xf mha4mysql-manager-0.56.tar.gz   
# cd mha4mysql-manager-0.56   
# perl Makefile.PL   
# make && make install
# 说明:安装的脚本程序都在 /usr/local/bin/ 目录下。
 
3. 节点间配置 SSH 登录无密码验证(MHA 主机之间使用 key 登录)
在 node5(Monitor):
# ssh-keygen -t rsa 
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3
在 node1(Master):
# ssh-keygen -t rsa 
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5
在 node2 (slave):
# ssh-keygen -t rsa 
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5
在 node3 (slave):
# ssh-keygen -t rsa 
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5

二、搭建主从复制环境
1. 主从复制环境配置过程
(1) mysql 安装过程略, 但是三节点要创建如下链接
node1(主),node2(主备从),node3(从)
注意:创建如下链接:
ln -s /usr/local/mysql/bin/* /usr/local/bin/
node1 my.cnf
server-id              = 1 
binlog-format          = ROW   
log-bin                = master-bin   
log-bin-index          = master-bin.index   
log-slave-updates      = true   
relay_log_purge        = 0
node2 my.cnf
server-id              = 2 
binlog-format          = ROW   
log-bin                = master-bin   
log-bin-index          = master-bin.index   
log-slave-updates      = true   
relay_log_purge        = 0
node3 my.cnf
binlog-format          = ROW 
log-bin                = mysql-bin   
relay-log              = slave-relay-bin   
relay-log-index        = slave-relay-bin.index   
log-slave-updates      = true   
server-id              = 11   
skip-name-resolve   
relay_log_purge        = 0

(2) 在 node1 (Master)上备份一份完整的数据:
# mysqldump -uroot -p123456 –master-data=2 –single-transaction -R –triggers -A > all.sql
其中 –master-data= 2 代表备份时刻记录 master 的 Binlog 位置和 Position。

(3) 在 node1 (Master)上创建复制用户:
mysql> grant replication slave on *.* to ‘repl’@’192.168.1.%’ identified by ‘123456’;   
mysql> flush privileges;

(4) 查看主库备份时的 binlog 名称和位置,MASTER_LOG_FILE 和 MASTER_LOG_POS:
# head -n 30 all.sql | grep ‘CHANGE MASTER TO’
— CHANGE MASTER TO MASTER_LOG_FILE=’master-bin.000004′, MASTER_LOG_POS=120;

(5) 把备份复制到 192.168.1.122 和 192.168.1.123
# scp all.sql 192.168.1.122:/root/ 
# scp all.sql 192.168.1.123:/root/

(6) 分别在两台服务器上导入备份,执行复制相关命令
在 node2、node3 主机上操作:
# mysql -uroot -p123456 < all.sql
mysql> stop slave;
CHANGE MASTER TO   
MASTER_HOST=’192.168.1.121′,   
MASTER_USER=’repl’,   
MASTER_PASSWORD=’123456′,   
MASTER_LOG_FILE=’master-bin.000004′,   
MASTER_LOG_POS=120;
mysql> start slave; 
mysql> show slave status\G

2. 创建 MHA 管理用户, 在 master 上创建。
mysql> grant all privileges on *.* to ‘root’@’192.168.1.%’ identified  by ‘123456’;   
mysql> flush  privileges;
 
三、配置 Keepalived VIP
vip 配置可以采用两种方式,一种通过 keepalived 的方式管理虚拟 ip 的浮动;一人是通过脚本方式,本文通过 keepalived 方式实现
1. 在 node1(Master) 与 node2(备选主节点)安装 keepalived。
# wget http://www.keepalived.org/software/keepalived-1.2.12.tar.gz   
# tar xf keepalived-1.2.12.tar.gz   
# cd keepalived-1.2.12   
# ./configure –prefix=/usr/local/keepalived   
# make &&  make install   
# cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/   
# cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/   
# mkdir /etc/keepalived   
# cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/   
# cp /usr/local/keepalived/sbin/keepalived /usr/sbin/

2. 配置 keepalived 的配置文件,在 node1(master)上配置操作如下:
注:keepalived 配置成 backup->backup,即 IP 地址切换后,主起来后 IP 地址不切换,本文监控脚本由 MHA 提供,keepalived 不提供对 mysqld 的监控。
# vi /etc/keepalived/keepalived.conf   
! Configuration File for keepalived
global_defs {
    notification_email {
    abc@163.com   
  }   
  notification_email_from dba@dbserver.com   
  smtp_server 127.0.0.1   
  smtp_connect_timeout 30   
  router_id MySQL-HA   
}
vrrp_instance VI_1 {
    state BACKUP   
    interface eth0   
    virtual_router_id 51   
    priority 150   
    advert_int 1   
    nopreempt
    authentication {
    auth_type PASS   
    auth_pass 1111   
    }
    virtual_ipaddress {
        192.168.1.130   
    }   
}

3. 配置 keepalived 的配置文件,在 node2(备用节点)上配置操作如下:
# vi /etc/keepalived/keepalived.conf   
! Configuration File for keepalived
global_defs {
    notification_email {
    abc@163.com   
  }   
  notification_email_from dba@dbserver.com   
  smtp_server 127.0.0.1   
  smtp_connect_timeout 30   
  router_id MySQL-HA   
}
vrrp_instance VI_1 {
    state BACKUP   
    interface eth0   
    virtual_router_id 51   
    priority 120   
    advert_int 1   
    nopreempt
    authentication {
    auth_type PASS   
    auth_pass 1111   
    }
    virtual_ipaddress {
        192.168.1.130   
    }   
}

4. node1,node2 启动 keepalived 服务
# service keepalived start 
# chkconfig keepalived on

5. node1 查看 VIP 启动情况
[root@node1 ~]# ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN   
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00   
    inet 127.0.0.1/8 scope host lo   
    inet6 ::1/128 scope host   
      valid_lft forever preferred_lft forever   
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000   
    link/ether 00:0c:29:4e:53:71 brd ff:ff:ff:ff:ff:ff   
    inet 192.168.1.121/24 brd 192.168.1.255 scope global eth0   
    inet 192.168.1.130/32 scope global eth0   
    inet6 fe80::20c:29ff:fe4e:5371/64 scope link   
      valid_lft forever preferred_lft forever

四、配置 MHA
1. monitor 创建 MHA 的工作目录,并且创建相关配置文件(在软件包解压后的目录里面有样例配置文件)。
# mkdir -p /etc/masterha 
# mkdir -p /var/log/masterha/app1   
# cp mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/masterha/
修改 app1.cnf 配置文件,修改后的文件内容如下:
# cat /etc/masterha/app1.cnf
[server default] 
manager_workdir=/var/log/masterha/app1   
manager_log=/var/log/masterha/app1/manager.log   
master_binlog_dir=/usr/local/mysql/data/   
master_ip_failover_script= /usr/local/bin/master_ip_failover   
master_ip_online_change_script= /usr/local/bin/master_ip_online_change   
password=123456   
user=root   
ping_interval=1   
remote_workdir=/tmp   
repl_password=123456   
repl_user=repl   
report_script=/usr/local/bin/send_report   
ssh_user=root 
[server1] 
hostname=192.168.1.121   
port=3306
[server2] 
hostname=192.168.1.122   
port=3306   
candidate_master=1   
check_repl_delay=0
[server3] 
hostname=192.168.1.123   
port=3306
说明:   
master_ip_failover_script= /usr/local/bin/master_ip_failover            #MHA 自动切换执行的脚本,需要修改   
master_ip_online_change_script= /usr/local/bin/master_ip_online_change  #手动切换需要执行的脚本,需要修改   
report_script=/usr/local/bin/send_report                                #切换时发送邮件进行报告,需要修改

2. 设置 relay log 的清除方式(在每个 slave 节点上):
(1)在 node2,node3 从节点上操作:
将 relay_log_purge= 0 加入 my.cnf 配置文件,前面已经配置。
(2) 设置定期清理 relay 脚本(node2,node3 上操作):
# cat purge_relay_log.sh   
#!/bin/bash   
user=root   
passwd=123456   
port=3306   
log_dir=’/data/masterha/log’   
work_dir=’/data’   
purge=’/usr/local/bin/purge_relay_logs’
if [! -d $log_dir] 
then   
  mkdir $log_dir -p   
fi
$purge –user=$user –password=$passwd –disable_relay_log_purge –port=$port –workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1
配置定时计划任务
# crontab -e 
0 4 * * * /bin/bash /root/purge_relay_log.sh
 
3. 要想把 keepalived 服务引入 MHA,我们只需要修改切换是触发的脚本文件 master_ip_failover 即可,在该脚本中添加在 master 发生宕机时对 keepalived 的处理。
(1) 编辑脚本 /usr/local/bin/master_ip_failover,修改后如下:
# vi /usr/local/bin/master_ip_failover   
#!/usr/bin/env perl   
use strict;   
use warnings FATAL => ‘all’;
use Getopt::Long;
my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,   
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port   
);
my $vip = ‘192.168.1.130’; 
my $ssh_start_vip = “/etc/init.d/keepalived start”;   
my $ssh_stop_vip = “/etc/init.d/keepalived stop”;
GetOptions(
    ‘command=s’          => \$command,   
    ‘ssh_user=s’        => \$ssh_user,   
    ‘orig_master_host=s’ => \$orig_master_host,   
    ‘orig_master_ip=s’  => \$orig_master_ip,   
    ‘orig_master_port=i’ => \$orig_master_port,   
    ‘new_master_host=s’  => \$new_master_host,   
    ‘new_master_ip=s’    => \$new_master_ip,   
    ‘new_master_port=i’  => \$new_master_port,   
);
exit &main(); 
sub main {
    print “\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n”;   
    if ($command eq “stop” || $command eq “stopssh”) {
        my $exit_code = 1;   
        eval {
            print “Disabling the VIP on old master: $orig_master_host \n”;   
            &stop_vip();   
            $exit_code = 0;   
        };   
        if ($@) {
            warn “Got Error: $@\n”;   
            exit $exit_code;   
        }   
        exit $exit_code;   
    }   
    elsif ($command eq “start”) {
        my $exit_code = 10; 
        eval {
            print “Enabling the VIP – $vip on the new master – $new_master_host \n”;   
            &start_vip();   
            $exit_code = 0;   
        };   
        if ($@) {
            warn $@;   
            exit $exit_code;   
        }   
        exit $exit_code;   
    }   
    elsif ($command eq “status”) {
        print “Checking the Status of the script.. OK \n”;   
        exit 0;   
    }   
    else {
        &usage();   
        exit 1;   
    }   
}   
sub start_vip() {   
    `ssh $ssh_user\@$new_master_host \” $ssh_start_vip \”`;   
}   
# A simple system call that disable the VIP on the old_master   
sub stop_vip() {   
    `ssh $ssh_user\@$orig_master_host \” $ssh_stop_vip \”`;   
}
sub usage {
    print   
    “Usage: master_ip_failover –command=start|stop|stopssh|status –orig_master_host=host –orig_master_ip=ip –orig_master_port=port –new_master_host=host –new_master_ip=ip –new_master_port=port\n”;   
}

(2) 编辑脚本 master_ip_online_change,修改后如下:
#!/usr/bin/env perl
#  Copyright (C) 2011 DeNA Co.,Ltd. 
#   
#  This program is free software; you can redistribute it and/or modify   
#  it under the terms of the GNU General Public License as published by   
#  the Free Software Foundation; either version 2 of the License, or   
#  (at your option) any later version.   
#   
#  This program is distributed in the hope that it will be useful,   
#  but WITHOUT ANY WARRANTY; without even the implied warranty of   
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the   
#  GNU General Public License for more details.   
#   
#  You should have received a copy of the GNU General Public License   
#  along with this program; if not, write to the Free Software   
#  Foundation, Inc.,   
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict; 
use warnings FATAL => ‘all’;
use Getopt::Long; 
use MHA::DBHelper;   
use MHA::NodeUtil;   
use Time::HiRes qw(sleep gettimeofday tv_interval);   
use Data::Dumper;
my $_tstart; 
my $_running_interval = 0.1;   
my (
  $command,              $orig_master_is_new_slave, $orig_master_host,   
  $orig_master_ip,      $orig_master_port,        $orig_master_user,   
  $orig_master_password, $orig_master_ssh_user,    $new_master_host,   
  $new_master_ip,        $new_master_port,          $new_master_user,   
  $new_master_password,  $new_master_ssh_user   
);   
my $vip = ‘192.168.1.130/24’;   
my $key = ‘1’;   
my $ssh_start_vip = “/sbin/ifconfig eth0:$key $vip”;   
my $ssh_stop_vip = “/sbin/ifconfig eth0:$key down”;   
my $orig_master_ssh_port = 22;   
my $new_master_ssh_port = 22;   
GetOptions(
  ‘command=s’                => \$command,   
  ‘orig_master_is_new_slave’ => \$orig_master_is_new_slave,   
  ‘orig_master_host=s’      => \$orig_master_host,   
  ‘orig_master_ip=s’        => \$orig_master_ip,   
  ‘orig_master_port=i’      => \$orig_master_port,   
  ‘orig_master_user=s’      => \$orig_master_user,   
  ‘orig_master_password=s’  => \$orig_master_password,   
  ‘orig_master_ssh_user=s’  => \$orig_master_ssh_user,   
  ‘new_master_host=s’        => \$new_master_host,   
  ‘new_master_ip=s’          => \$new_master_ip,   
  ‘new_master_port=i’        => \$new_master_port,   
  ‘new_master_user=s’        => \$new_master_user,   
  ‘new_master_password=s’    => \$new_master_password,   
  ‘new_master_ssh_user=s’    => \$new_master_ssh_user,   
  ‘orig_master_ssh_port=i’    => \$orig_master_ssh_port,   
  ‘new_master_ssh_port=i’    => \$new_master_ssh_port,   
);
exit &main();
sub current_time_us {
  my ($sec, $microsec) = gettimeofday();   
  my $curdate = localtime($sec);   
  return $curdate . ” ” . sprintf(“%06d”, $microsec);   
}
sub sleep_until {
  my $elapsed = tv_interval($_tstart);   
  if ($_running_interval > $elapsed) {
    sleep($_running_interval – $elapsed);   
  }   
}
sub get_threads_util {
  my $dbh                    = shift;   
  my $my_connection_id      = shift;   
  my $running_time_threshold = shift;   
  my $type                  = shift;   
  $running_time_threshold = 0 unless ($running_time_threshold);   
  $type                  = 0 unless ($type);   
  my @threads;
  my $sth = $dbh->prepare(“SHOW PROCESSLIST”); 
  $sth->execute();
  while (my $ref = $sth->fetchrow_hashref() ) {
    my $id        = $ref->{Id};   
    my $user      = $ref->{User};   
    my $host      = $ref->{Host};   
    my $command    = $ref->{Command};   
    my $state      = $ref->{State};   
    my $query_time = $ref->{Time};   
    my $info      = $ref->{Info};   
    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);   
    next if ($my_connection_id == $id);   
    next if (defined($query_time) && $query_time < $running_time_threshold );   
    next if (defined($command)    && $command eq “Binlog Dump” );   
    next if (defined($user)      && $user eq “system user” );   
    next   
      if (defined($command)   
      && $command eq “Sleep”   
      && defined($query_time)   
      && $query_time >= 1 );
    if ($type >= 1) {
      next if (defined($command) && $command eq “Sleep” );   
      next if (defined($command) && $command eq “Connect” );   
    }
    if ($type >= 2) {
      next if (defined($info) && $info =~ m/^select/i );   
      next if (defined($info) && $info =~ m/^show/i );   
    }
    push @threads, $ref; 
  }   
  return @threads;   
}
sub main {
  if ($command eq “stop”) {
    ## Gracefully killing connections on the current master   
    # 1. Set read_only= 1 on the new master   
    # 2. DROP USER so that no app user can establish new connections   
    # 3. Set read_only= 1 on the current master   
    # 4. Kill current queries   
    # * Any database access failure will result in script die.   
    my $exit_code = 1;   
    eval {
      ## Setting read_only=1 on the new master (to avoid accident)   
      my $new_master_handler = new MHA::DBHelper();
      # args: hostname, port, user, password, raise_error(die_on_error)_or_not 
      $new_master_handler->connect($new_master_ip, $new_master_port,   
        $new_master_user, $new_master_password, 1 );   
      print current_time_us() . ” Set read_only on the new master.. “;   
      $new_master_handler->enable_read_only();   
      if ($new_master_handler->is_read_only() ) {
        print “ok.\n”;   
      }   
      else {
        die “Failed!\n”;   
      }   
      $new_master_handler->disconnect();
      # Connecting to the orig master, die if any database error happens 
      my $orig_master_handler = new MHA::DBHelper();   
      $orig_master_handler->connect($orig_master_ip, $orig_master_port,   
        $orig_master_user, $orig_master_password, 1 );
      ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand 
      $orig_master_handler->disable_log_bin_local();   
      print current_time_us() . ” Drpping app user on the orig master..\n”;   
      #FIXME_xxx_drop_app_user($orig_master_handler);
      ## Waiting for N * 100 milliseconds so that current connections can exit 
      my $time_until_read_only = 15;   
      $_tstart = [gettimeofday];   
      my @threads = get_threads_util($orig_master_handler->{dbh},   
        $orig_master_handler->{connection_id} );   
      while ($time_until_read_only > 0 && $#threads >= 0) {
        if ($time_until_read_only % 5 == 0) {
          printf   
“%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n”,   
            current_time_us(), $#threads + 1, $time_until_read_only * 100;   
          if ($#threads < 5) {
            print Data::Dumper->new([$_] )->Indent(0)->Terse(1)->Dump . “\n”   
              foreach (@threads);   
          }   
        }   
        sleep_until();   
        $_tstart = [gettimeofday];   
        $time_until_read_only–;   
        @threads = get_threads_util($orig_master_handler->{dbh},   
          $orig_master_handler->{connection_id} );   
      }
      ## Setting read_only=1 on the current master so that nobody(except SUPER) can write 
      print current_time_us() . ” Set read_only=1 on the orig master.. “;   
      $orig_master_handler->enable_read_only();   
      if ($orig_master_handler->is_read_only() ) {
        print “ok.\n”;   
      }   
      else {
        die “Failed!\n”;   
      }
      ## Waiting for M * 100 milliseconds so that current update queries can complete 
      my $time_until_kill_threads = 5;   
      @threads = get_threads_util($orig_master_handler->{dbh},   
        $orig_master_handler->{connection_id} );   
      while ($time_until_kill_threads > 0 && $#threads >= 0) {
        if ($time_until_kill_threads % 5 == 0) {
          printf   
“%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n”,   
            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;   
          if ($#threads < 5) {
            print Data::Dumper->new([$_] )->Indent(0)->Terse(1)->Dump . “\n”   
              foreach (@threads);   
          }   
        }   
        sleep_until();   
        $_tstart = [gettimeofday];   
        $time_until_kill_threads–;   
        @threads = get_threads_util($orig_master_handler->{dbh},   
          $orig_master_handler->{connection_id} );   
      }
      ## Terminating all threads 
      print current_time_us() . ” Killing all application threads..\n”;   
      $orig_master_handler->kill_threads(@threads) if ($#threads >= 0);   
      print current_time_us() . ” done.\n”;   
      $orig_master_handler->enable_log_bin_local();   
      $orig_master_handler->disconnect();
      ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK 
      eval {
      `ssh -p$orig_master_ssh_port $orig_master_ssh_user\@$orig_master_host \” $ssh_stop_vip \”`;   
        };   
        if ($@) {
            warn $@;   
        }   
      $exit_code = 0;   
    };   
    if ($@) {
      warn “Got Error: $@\n”;   
      exit $exit_code;   
    }   
    exit $exit_code;   
  }   
  elsif ($command eq “start”) {
    ## Activating master ip on the new master   
    # 1. Create app user with write privileges   
    # 2. Moving backup script if needed   
    # 3. Register new master’s ip to the catalog database
# We don’t return error even though activating updatable accounts/ip failed so that we don’t interrupt slaves’ recovery. 
# If exit code is 0 or 10, MHA does not abort   
    my $exit_code = 10;   
    eval {
      my $new_master_handler = new MHA::DBHelper();
      # args: hostname, port, user, password, raise_error_or_not 
      $new_master_handler->connect($new_master_ip, $new_master_port,   
        $new_master_user, $new_master_password, 1 );
      ## Set read_only=0 on the new master 
      $new_master_handler->disable_log_bin_local();   
      print current_time_us() . ” Set read_only=0 on the new master.\n”;   
      $new_master_handler->disable_read_only();
      ## Creating an app user on the new master 
      print current_time_us() . ” Creating app user on the new master..\n”;   
      #FIXME_xxx_create_app_user($new_master_handler);   
      $new_master_handler->enable_log_bin_local();   
      $new_master_handler->disconnect();
      ## Update master ip on the catalog database, etc 
      `ssh -p$new_master_ssh_port $new_master_ssh_user\@$new_master_host \” $ssh_start_vip \”`;   
      $exit_code = 0;   
    };   
    if ($@) {
      warn “Got Error: $@\n”;   
      exit $exit_code;   
    }   
    exit $exit_code;   
  }   
  elsif ($command eq “status”) {
    # do nothing 
    exit 0;   
  }   
  else {
    &usage();   
    exit 1;   
  }   
}
sub usage {
  print   
“Usage: master_ip_online_change –command=start|stop|status –orig_master_host=host –orig_master_ip=ip –orig_master_port=port –new_master_host=host –new_master_ip=ip –new_master_port=port\n”;   
  die;   
}

(3) 编辑脚本 send_report,修改后如下:
#!/usr/bin/perl
use strict; 
use warnings FATAL => ‘all’;   
use Mail::Sender;   
use Getopt::Long;
#new_master_host and new_slave_hosts are set only when recovering master succeeded 
my ($dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body);   
my $smtp=’smtp.163.com’;   
my $mail_from=’xxxx’;   
my $mail_user=’xxxxx’;   
my $mail_pass=’xxxxx’;   
my $mail_to=[‘xxxx’,’xxxx’];   
GetOptions(
  ‘orig_master_host=s’ => \$dead_master_host,   
  ‘new_master_host=s’  => \$new_master_host,   
  ‘new_slave_hosts=s’  => \$new_slave_hosts,   
  ‘subject=s’          => \$subject,   
  ‘body=s’            => \$body,   
);
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
sub mailToContacts {
    my ($smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg) = @_;   
    open my $DEBUG, “> /tmp/monitormail.log”   
        or die “Can’t open the debug      file:$!\n”;   
    my $sender = new Mail::Sender {
        ctype      => ‘text/plain; charset=utf-8’,   
        encoding    => ‘utf-8’,   
        smtp        => $smtp,   
        from        => $mail_from,   
        auth        => ‘LOGIN’,   
        TLS_allowed => ‘0’,   
        authid      => $user,   
        authpwd    => $passwd,   
        to          => $mail_to,   
        subject    => $subject,   
        debug      => $DEBUG   
    };
    $sender->MailMsg(
        {msg  => $msg,   
            debug => $DEBUG   
        }   
    ) or print $Mail::Sender::Error;   
    return 1;   
}
# Do whatever you want here
exit 0;
 
五、MHA 的日常管理
1. 检查 SSH 配置(node5 Monitor 监控节点上操作),如下:
# masterha_check_ssh –conf=/etc/masterha/app1.cnf
Sun May  1 22:05:12 2016 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. 
Sun May  1 22:05:12 2016 – [info] Reading application default configuration from /etc/masterha/app1.cnf..   
Sun May  1 22:05:12 2016 – [info] Reading server configuration from /etc/masterha/app1.cnf..   
Sun May  1 22:05:12 2016 – [info] Starting SSH connection tests..   
Sun May  1 22:05:14 2016 – [debug]   
Sun May  1 22:05:12 2016 – [debug]  Connecting via SSH from root@192.168.1.121(192.168.1.121:22) to root@192.168.1.122(192.168.1.122:22)..   
Sun May  1 22:05:13 2016 – [debug]  ok.   
Sun May  1 22:05:13 2016 – [debug]  Connecting via SSH from root@192.168.1.121(192.168.1.121:22) to root@192.168.1.123(192.168.1.123:22)..   
Sun May  1 22:05:13 2016 – [debug]  ok.   
Sun May  1 22:05:14 2016 – [debug]   
Sun May  1 22:05:13 2016 – [debug]  Connecting via SSH from root@192.168.1.122(192.168.1.122:22) to root@192.168.1.121(192.168.1.121:22)..   
Sun May  1 22:05:13 2016 – [debug]  ok.   
Sun May  1 22:05:13 2016 – [debug]  Connecting via SSH from root@192.168.1.122(192.168.1.122:22) to root@192.168.1.123(192.168.1.123:22)..   
Sun May  1 22:05:14 2016 – [debug]  ok.   
Sun May  1 22:05:14 2016 – [debug]   
Sun May  1 22:05:13 2016 – [debug]  Connecting via SSH from root@192.168.1.123(192.168.1.123:22) to root@192.168.1.121(192.168.1.121:22)..   
Sun May  1 22:05:14 2016 – [debug]  ok.   
Sun May  1 22:05:14 2016 – [debug]  Connecting via SSH from root@192.168.1.123(192.168.1.123:22) to root@192.168.1.122(192.168.1.122:22)..   
Sun May  1 22:05:14 2016 – [debug]  ok.   
Sun May  1 22:05:14 2016 – [info] All SSH connection tests passed successfully.

2. 检查整个复制环境状况(node5 监控节点上操作),如下:
# masterha_check_repl –conf=/etc/masterha/app1.cnf
Sun May  1 22:46:44 2016 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. 
Sun May  1 22:46:44 2016 – [info] Reading application default configuration from /etc/masterha/app1.cnf..   
Sun May  1 22:46:44 2016 – [info] Reading server configuration from /etc/masterha/app1.cnf..   
Sun May  1 22:46:44 2016 – [info] MHA::MasterMonitor version 0.56.   
Sun May  1 22:46:45 2016 – [info] GTID failover mode = 0   
Sun May  1 22:46:45 2016 – [info] Dead Servers:   
Sun May  1 22:46:45 2016 – [info] Alive Servers:   
Sun May  1 22:46:45 2016 – [info]  192.168.1.121(192.168.1.121:3306)   
Sun May  1 22:46:45 2016 – [info]  192.168.1.122(192.168.1.122:3306)   
Sun May  1 22:46:45 2016 – [info]  192.168.1.123(192.168.1.123:3306)   
Sun May  1 22:46:45 2016 – [info] Alive Slaves:   
Sun May  1 22:46:45 2016 – [info]  192.168.1.122(192.168.1.122:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled   
Sun May  1 22:46:45 2016 – [info]    Replicating from 192.168.1.121(192.168.1.121:3306)   
Sun May  1 22:46:45 2016 – [info]    Primary candidate for the new Master (candidate_master is set)   
Sun May  1 22:46:45 2016 – [info]  192.168.1.123(192.168.1.123:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled   
Sun May  1 22:46:45 2016 – [info]    Replicating from 192.168.1.121(192.168.1.121:3306)   
Sun May  1 22:46:45 2016 – [info] Current Alive Master: 192.168.1.121(192.168.1.121:3306)   
Sun May  1 22:46:45 2016 – [info] Checking slave configurations..   
Sun May  1 22:46:45 2016 – [info]  read_only=1 is not set on slave 192.168.1.122(192.168.1.122:3306).   
Sun May  1 22:46:45 2016 – [warning]  relay_log_purge=0 is not set on slave 192.168.1.122(192.168.1.122:3306).   
Sun May  1 22:46:45 2016 – [info]  read_only=1 is not set on slave 192.168.1.123(192.168.1.123:3306).   
Sun May  1 22:46:45 2016 – [warning]  relay_log_purge=0 is not set on slave 192.168.1.123(192.168.1.123:3306).   
Sun May  1 22:46:45 2016 – [info] Checking replication filtering settings..   
Sun May  1 22:46:45 2016 – [info]  binlog_do_db= , binlog_ignore_db=   
Sun May  1 22:46:45 2016 – [info]  Replication filtering check ok.   
Sun May  1 22:46:45 2016 – [info] GTID (with auto-pos) is not supported   
Sun May  1 22:46:45 2016 – [info] Starting SSH connection tests..   
Sun May  1 22:46:46 2016 – [info] All SSH connection tests passed successfully.   
Sun May  1 22:46:46 2016 – [info] Checking MHA Node version..   
Sun May  1 22:46:47 2016 – [info]  Version check ok.   
Sun May  1 22:46:47 2016 – [info] Checking SSH publickey authentication settings on the current master..   
Sun May  1 22:46:47 2016 – [info] HealthCheck: SSH to 192.168.1.121 is reachable.   
Sun May  1 22:46:47 2016 – [info] Master MHA Node version is 0.56.   
Sun May  1 22:46:47 2016 – [info] Checking recovery script configurations on 192.168.1.121(192.168.1.121:3306)..   
Sun May  1 22:46:47 2016 – [info]  Executing command: save_binary_logs –command=test –start_pos=4 –binlog_dir=/usr/local/mysql/data/ –output_file=/tmp/save_binary_logs_test –manager_version=0.56 –start_file=master-bin.000008   
Sun May  1 22:46:47 2016 – [info]  Connecting to root@192.168.1.121(192.168.1.121:22)..   
  Creating /tmp if not exists..    ok.   
  Checking output directory is accessible or not..   
  ok.   
  Binlog found at /usr/local/mysql/data/, up to master-bin.000008   
Sun May  1 22:46:48 2016 – [info] Binlog setting check done.   
Sun May  1 22:46:48 2016 – [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..   
Sun May  1 22:46:48 2016 – [info]  Executing command : apply_diff_relay_logs –command=test –slave_user=’root’ –slave_host=192.168.1.122 –slave_ip=192.168.1.122 –slave_port=3306 –workdir=/tmp –target_version=5.6.29-log –manager_version=0.56 –relay_log_info=/usr/local/mysql/data/relay-log.info  –relay_dir=/usr/local/mysql/data/  –slave_pass=xxx   
Sun May  1 22:46:48 2016 – [info]  Connecting to root@192.168.1.122(192.168.1.122:22)..   
  Checking slave recovery environment settings..   
    Opening /usr/local/mysql/data/relay-log.info … ok.   
    Relay log found at /usr/local/mysql/data, up to node2-relay-bin.000002   
    Temporary relay log file is /usr/local/mysql/data/node2-relay-bin.000002   
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.   
done.   
    Testing mysqlbinlog output.. done.   
    Cleaning up test file(s).. done.   
Sun May  1 22:46:48 2016 – [info]  Executing command : apply_diff_relay_logs –command=test –slave_user=’root’ –slave_host=192.168.1.123 –slave_ip=192.168.1.123 –slave_port=3306 –workdir=/tmp –target_version=5.6.29-log –manager_version=0.56 –relay_log_info=/usr/local/mysql/data/relay-log.info  –relay_dir=/usr/local/mysql/data/  –slave_pass=xxx   
Sun May  1 22:46:48 2016 – [info]  Connecting to root@192.168.1.123(192.168.1.123:22)..   
  Checking slave recovery environment settings..   
    Opening /usr/local/mysql/data/relay-log.info … ok.   
    Relay log found at /usr/local/mysql/data, up to slave-relay-bin.000012   
    Temporary relay log file is /usr/local/mysql/data/slave-relay-bin.000012   
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.   
done.   
    Testing mysqlbinlog output.. done.   
    Cleaning up test file(s).. done.   
Sun May  1 22:46:48 2016 – [info] Slaves settings check done.   
Sun May  1 22:46:48 2016 – [info]   
192.168.1.121(192.168.1.121:3306) (current master)   
+–192.168.1.122(192.168.1.122:3306)   
+–192.168.1.123(192.168.1.123:3306)
Sun May  1 22:46:48 2016 – [info] Checking replication health on 192.168.1.122.. 
Sun May  1 22:46:48 2016 – [info]  ok.   
Sun May  1 22:46:48 2016 – [info] Checking replication health on 192.168.1.123..   
Sun May  1 22:46:48 2016 – [info]  ok.   
Sun May  1 22:46:48 2016 – [info] Checking master_ip_failover_script status:   
Sun May  1 22:46:48 2016 – [info]  /usr/local/bin/master_ip_failover –command=status –ssh_user=root –orig_master_host=192.168.1.121 –orig_master_ip=192.168.1.121 –orig_master_port=3306
IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===
Checking the Status of the script.. OK   
Sun May  1 22:46:48 2016 – [info]  OK.   
Sun May  1 22:46:48 2016 – [warning] shutdown_script is not defined.   
Sun May  1 22:46:48 2016 – [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK. 
[root@node5 masterha]#

3. 开启 MHA Manager 监控 (node5 操作) 如下:
# mkdir -p  /var/log/masterha/app1/ 
# nohup masterha_manager –conf=/etc/masterha/app1.cnf –remove_dead_master_conf –ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
参数说明:
–remove_dead_master_conf    #该参数代表当发生主从切换后,老的主库的 ip 将会从配置文件中移除。
–manger_log                  #日志存放位置   
–ignore_last_failover        #在缺省情况下,如果 MHA 检测到连续发生宕机,会生成 app1.failover.complete 文件,会造成 MHA 管理进程无法启动。

4. 查看 MHA Manager 监控是否正常:
# masterha_check_status –conf=/etc/masterha/app1.cnf 
app1 (pid:2480) is running(0:PING_OK), master:192.168.1.121

5. 查看启动日志 (node5 操作) 如下:
# tail -n20 /var/log/masterha/app1/manager.log

6. 关闭 MHA Manage 监控:
(1) 关闭
# masterha_stop –conf=/etc/masterha/app1.cnf
(2) 启动
# nohup masterha_manager –conf=/etc/masterha/app1.cnf –remove_dead_master_conf –ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1

六、MHA Failover 切换
1. 自动 Failover 切换
(1) 模拟 master mysql 关闭 
(2) VIP 将会切换到 node2   
(3) /etc/masterha/app1.cnf 中将原主服务器配置文件清掉。
(4) masterha_manager 监控进程会自动退出关闭,并在 /var/log/masterha/app1 下生成 app1.failover.complete 文件,manager.log 会记录全过程,从服务器会自动从新的主服务器复制。
(5) 原主服务器 mysqld 启动的,需要清掉 /var/log/masterha/app1 下生成 app1.failover.complete 文件,添加 node1 配置文件到 /etc/masterha/app1.cnf,通过 manager.log 中的记录的故障点,重新同步主服务器,成为从节点。

2. 手动 Failover 切换
(1) 先停 MHA Manager 进程。
masterha_stop –conf=/etc/masterha/app1.cnf   
(2) 停掉 master mysqld   
(3) 手动切换,在 Manager 主机上操作如下:
# masterha_master_switch –master_state=dead –conf=/etc/masterha/app1.cnf –dead_master_host=192.168.1.122 –dead_master_port=3306 –new_master_host=192.168.1.121 –new_master_port=3306 –ignore_last_failover
通过观察日志可以观察切换全过程。
(4) 如上节方式恢复节点为从服务器。
3.  正常运行情况下切换 (Master 正在运行)
等补充。
4. 小结
通过对 MMM,MHA 的环境搭建测试,MHA 由于采用复制架构,原理简单,在一些对数据要求比较高的环境,为了保证可靠性,最好与半同步结合使用。

本文永久更新链接地址:http://www.linuxidc.com/Linux/2016-05/130925.htm

正文完
星哥玩云-微信公众号
post-qrcode
 0
星锅
版权声明:本站原创文章,由 星锅 于2022-01-22发表,共计28672字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
【腾讯云】推广者专属福利,新客户无门槛领取总价值高达2860元代金券,每种代金券限量500张,先到先得。
阿里云-最新活动爆款每日限量供应
评论(没有评论)
验证码
【腾讯云】云服务器、云数据库、COS、CDN、短信等云产品特惠热卖中

星哥玩云

星哥玩云
星哥玩云
分享互联网知识
用户数
4
文章数
19350
评论数
4
阅读量
7902866
文章搜索
热门文章
开发者必备神器:阿里云 Qoder CLI 全面解析与上手指南

开发者必备神器:阿里云 Qoder CLI 全面解析与上手指南

开发者必备神器:阿里云 Qoder CLI 全面解析与上手指南 大家好,我是星哥。之前介绍了腾讯云的 Code...
星哥带你玩飞牛NAS-6:抖音视频同步工具,视频下载自动下载保存

星哥带你玩飞牛NAS-6:抖音视频同步工具,视频下载自动下载保存

星哥带你玩飞牛 NAS-6:抖音视频同步工具,视频下载自动下载保存 前言 各位玩 NAS 的朋友好,我是星哥!...
星哥带你玩飞牛NAS-3:安装飞牛NAS后的很有必要的操作

星哥带你玩飞牛NAS-3:安装飞牛NAS后的很有必要的操作

星哥带你玩飞牛 NAS-3:安装飞牛 NAS 后的很有必要的操作 前言 如果你已经有了飞牛 NAS 系统,之前...
我把用了20年的360安全卫士卸载了

我把用了20年的360安全卫士卸载了

我把用了 20 年的 360 安全卫士卸载了 是的,正如标题你看到的。 原因 偷摸安装自家的软件 莫名其妙安装...
再见zabbix!轻量级自建服务器监控神器在Linux 的完整部署指南

再见zabbix!轻量级自建服务器监控神器在Linux 的完整部署指南

再见 zabbix!轻量级自建服务器监控神器在 Linux 的完整部署指南 在日常运维中,服务器监控是绕不开的...
阿里云CDN
阿里云CDN-提高用户访问的响应速度和成功率
随机文章
国产开源公众号AI知识库 Agent:突破未认证号限制,一键搞定自动回复,重构运营效率

国产开源公众号AI知识库 Agent:突破未认证号限制,一键搞定自动回复,重构运营效率

国产开源公众号 AI 知识库 Agent:突破未认证号限制,一键搞定自动回复,重构运营效率 大家好,我是星哥,...
星哥带你玩飞牛NAS-6:抖音视频同步工具,视频下载自动下载保存

星哥带你玩飞牛NAS-6:抖音视频同步工具,视频下载自动下载保存

星哥带你玩飞牛 NAS-6:抖音视频同步工具,视频下载自动下载保存 前言 各位玩 NAS 的朋友好,我是星哥!...
支付宝、淘宝、闲鱼又双叕崩了,Cloudflare也瘫了连监控都挂,根因藏在哪?

支付宝、淘宝、闲鱼又双叕崩了,Cloudflare也瘫了连监控都挂,根因藏在哪?

支付宝、淘宝、闲鱼又双叕崩了,Cloudflare 也瘫了连监控都挂,根因藏在哪? 最近两天的互联网堪称“故障...
零成本上线!用 Hugging Face免费服务器+Docker 快速部署HertzBeat 监控平台

零成本上线!用 Hugging Face免费服务器+Docker 快速部署HertzBeat 监控平台

零成本上线!用 Hugging Face 免费服务器 +Docker 快速部署 HertzBeat 监控平台 ...
星哥带你玩飞牛NAS-2:飞牛配置RAID磁盘阵列

星哥带你玩飞牛NAS-2:飞牛配置RAID磁盘阵列

星哥带你玩飞牛 NAS-2:飞牛配置 RAID 磁盘阵列 前言 大家好,我是星哥之前星哥写了《星哥带你玩飞牛 ...

免费图片视频管理工具让灵感库告别混乱

一言一句话
-「
手气不错
星哥带你玩飞牛NAS-14:解锁公网自由!Lucky功能工具安装使用保姆级教程

星哥带你玩飞牛NAS-14:解锁公网自由!Lucky功能工具安装使用保姆级教程

星哥带你玩飞牛 NAS-14:解锁公网自由!Lucky 功能工具安装使用保姆级教程 作为 NAS 玩家,咱们最...
星哥带你玩飞牛 NAS-9:全能网盘搜索工具 13 种云盘一键搞定!

星哥带你玩飞牛 NAS-9:全能网盘搜索工具 13 种云盘一键搞定!

星哥带你玩飞牛 NAS-9:全能网盘搜索工具 13 种云盘一键搞定! 前言 作为 NAS 玩家,你是否总被这些...
开源MoneyPrinterTurbo 利用AI大模型,一键生成高清短视频!

开源MoneyPrinterTurbo 利用AI大模型,一键生成高清短视频!

  开源 MoneyPrinterTurbo 利用 AI 大模型,一键生成高清短视频! 在短视频内容...
如何免费使用强大的Nano Banana Pro?附赠邪修的用法

如何免费使用强大的Nano Banana Pro?附赠邪修的用法

如何免费使用强大的 Nano Banana Pro?附赠邪修的用法 前言 大家好,我是星哥,今天来介绍谷歌的 ...
星哥带你玩飞牛NAS-12:开源笔记的进化之路,效率玩家的新选择

星哥带你玩飞牛NAS-12:开源笔记的进化之路,效率玩家的新选择

星哥带你玩飞牛 NAS-12:开源笔记的进化之路,效率玩家的新选择 前言 如何高效管理知识与笔记,已经成为技术...