阿里云-云小站(无限量代金券发放中)
【腾讯云】云服务器、云数据库、COS、CDN、短信等热卖云产品特惠抢购

MySQL MHA高可用环境搭建

157次阅读
没有评论

共计 28672 个字符,预计需要花费 72 分钟才能阅读完成。

一、安装 MHA 基本环境
1. 安装 MHA node
(1) 基本环境说明,本文参考互联网文章学习,搭建 MHA 与测试如下。
参考文档:http://www.linuxidc.com/Linux/2016-05/130923.htm
角色                IP 地址            主机名   
=============================================   
Master              192.168.1.121    node1   
Slave              192.168.1.122    node2   
Slave              192.168.1.123    node3   
Monitor host        192.168.1.125    node5

(2) 在 node1,node2,node3,node5 操作:
# vi /etc/hosts
192.168.1.121  node1 
192.168.1.122  node2   
192.168.1.123  node3   
192.168.1.125  node5
安装 MHA node 节点软件包:
# rpm -ivh http://dl.Fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm   
# yum install perl-DBD-MySQL perl-CPAN -y   
# tar xf mha4mysql-node-0.56.tar.gz   
# cd mha4mysql-node-0.56   
# perl Makefile.PL       
# make && make install   
 
2. 安装 MHA Manager
在 node5 管理节点上操作:注:MHA Manager 主机也是需要安装 MHA Node,MHA Manger
# yum install perl-DBD-MySQL perl-CPAN perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y 
# tar xf mha4mysql-manager-0.56.tar.gz   
# cd mha4mysql-manager-0.56   
# perl Makefile.PL   
# make && make install
# 说明:安装的脚本程序都在 /usr/local/bin/ 目录下。
 
3. 节点间配置 SSH 登录无密码验证(MHA 主机之间使用 key 登录)
在 node5(Monitor):
# ssh-keygen -t rsa 
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3
在 node1(Master):
# ssh-keygen -t rsa 
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5
在 node2 (slave):
# ssh-keygen -t rsa 
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5
在 node3 (slave):
# ssh-keygen -t rsa 
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5

二、搭建主从复制环境
1. 主从复制环境配置过程
(1) mysql 安装过程略, 但是三节点要创建如下链接
node1(主),node2(主备从),node3(从)
注意:创建如下链接:
ln -s /usr/local/mysql/bin/* /usr/local/bin/
node1 my.cnf
server-id              = 1 
binlog-format          = ROW   
log-bin                = master-bin   
log-bin-index          = master-bin.index   
log-slave-updates      = true   
relay_log_purge        = 0
node2 my.cnf
server-id              = 2 
binlog-format          = ROW   
log-bin                = master-bin   
log-bin-index          = master-bin.index   
log-slave-updates      = true   
relay_log_purge        = 0
node3 my.cnf
binlog-format          = ROW 
log-bin                = mysql-bin   
relay-log              = slave-relay-bin   
relay-log-index        = slave-relay-bin.index   
log-slave-updates      = true   
server-id              = 11   
skip-name-resolve   
relay_log_purge        = 0

(2) 在 node1 (Master)上备份一份完整的数据:
# mysqldump -uroot -p123456 –master-data=2 –single-transaction -R –triggers -A > all.sql
其中 –master-data= 2 代表备份时刻记录 master 的 Binlog 位置和 Position。

(3) 在 node1 (Master)上创建复制用户:
mysql> grant replication slave on *.* to ‘repl’@’192.168.1.%’ identified by ‘123456’;   
mysql> flush privileges;

(4) 查看主库备份时的 binlog 名称和位置,MASTER_LOG_FILE 和 MASTER_LOG_POS:
# head -n 30 all.sql | grep ‘CHANGE MASTER TO’
— CHANGE MASTER TO MASTER_LOG_FILE=’master-bin.000004′, MASTER_LOG_POS=120;

(5) 把备份复制到 192.168.1.122 和 192.168.1.123
# scp all.sql 192.168.1.122:/root/ 
# scp all.sql 192.168.1.123:/root/

(6) 分别在两台服务器上导入备份,执行复制相关命令
在 node2、node3 主机上操作:
# mysql -uroot -p123456 < all.sql
mysql> stop slave;
CHANGE MASTER TO   
MASTER_HOST=’192.168.1.121′,   
MASTER_USER=’repl’,   
MASTER_PASSWORD=’123456′,   
MASTER_LOG_FILE=’master-bin.000004′,   
MASTER_LOG_POS=120;
mysql> start slave; 
mysql> show slave status\G

2. 创建 MHA 管理用户, 在 master 上创建。
mysql> grant all privileges on *.* to ‘root’@’192.168.1.%’ identified  by ‘123456’;   
mysql> flush  privileges;
 
三、配置 Keepalived VIP
vip 配置可以采用两种方式,一种通过 keepalived 的方式管理虚拟 ip 的浮动;一人是通过脚本方式,本文通过 keepalived 方式实现
1. 在 node1(Master) 与 node2(备选主节点)安装 keepalived。
# wget http://www.keepalived.org/software/keepalived-1.2.12.tar.gz   
# tar xf keepalived-1.2.12.tar.gz   
# cd keepalived-1.2.12   
# ./configure –prefix=/usr/local/keepalived   
# make &&  make install   
# cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/   
# cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/   
# mkdir /etc/keepalived   
# cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/   
# cp /usr/local/keepalived/sbin/keepalived /usr/sbin/

2. 配置 keepalived 的配置文件,在 node1(master)上配置操作如下:
注:keepalived 配置成 backup->backup,即 IP 地址切换后,主起来后 IP 地址不切换,本文监控脚本由 MHA 提供,keepalived 不提供对 mysqld 的监控。
# vi /etc/keepalived/keepalived.conf   
! Configuration File for keepalived
global_defs {
    notification_email {
    abc@163.com   
  }   
  notification_email_from dba@dbserver.com   
  smtp_server 127.0.0.1   
  smtp_connect_timeout 30   
  router_id MySQL-HA   
}
vrrp_instance VI_1 {
    state BACKUP   
    interface eth0   
    virtual_router_id 51   
    priority 150   
    advert_int 1   
    nopreempt
    authentication {
    auth_type PASS   
    auth_pass 1111   
    }
    virtual_ipaddress {
        192.168.1.130   
    }   
}

3. 配置 keepalived 的配置文件,在 node2(备用节点)上配置操作如下:
# vi /etc/keepalived/keepalived.conf   
! Configuration File for keepalived
global_defs {
    notification_email {
    abc@163.com   
  }   
  notification_email_from dba@dbserver.com   
  smtp_server 127.0.0.1   
  smtp_connect_timeout 30   
  router_id MySQL-HA   
}
vrrp_instance VI_1 {
    state BACKUP   
    interface eth0   
    virtual_router_id 51   
    priority 120   
    advert_int 1   
    nopreempt
    authentication {
    auth_type PASS   
    auth_pass 1111   
    }
    virtual_ipaddress {
        192.168.1.130   
    }   
}

4. node1,node2 启动 keepalived 服务
# service keepalived start 
# chkconfig keepalived on

5. node1 查看 VIP 启动情况
[root@node1 ~]# ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN   
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00   
    inet 127.0.0.1/8 scope host lo   
    inet6 ::1/128 scope host   
      valid_lft forever preferred_lft forever   
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000   
    link/ether 00:0c:29:4e:53:71 brd ff:ff:ff:ff:ff:ff   
    inet 192.168.1.121/24 brd 192.168.1.255 scope global eth0   
    inet 192.168.1.130/32 scope global eth0   
    inet6 fe80::20c:29ff:fe4e:5371/64 scope link   
      valid_lft forever preferred_lft forever

四、配置 MHA
1. monitor 创建 MHA 的工作目录,并且创建相关配置文件(在软件包解压后的目录里面有样例配置文件)。
# mkdir -p /etc/masterha 
# mkdir -p /var/log/masterha/app1   
# cp mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/masterha/
修改 app1.cnf 配置文件,修改后的文件内容如下:
# cat /etc/masterha/app1.cnf
[server default] 
manager_workdir=/var/log/masterha/app1   
manager_log=/var/log/masterha/app1/manager.log   
master_binlog_dir=/usr/local/mysql/data/   
master_ip_failover_script= /usr/local/bin/master_ip_failover   
master_ip_online_change_script= /usr/local/bin/master_ip_online_change   
password=123456   
user=root   
ping_interval=1   
remote_workdir=/tmp   
repl_password=123456   
repl_user=repl   
report_script=/usr/local/bin/send_report   
ssh_user=root 
[server1] 
hostname=192.168.1.121   
port=3306
[server2] 
hostname=192.168.1.122   
port=3306   
candidate_master=1   
check_repl_delay=0
[server3] 
hostname=192.168.1.123   
port=3306
说明:   
master_ip_failover_script= /usr/local/bin/master_ip_failover            #MHA 自动切换执行的脚本,需要修改   
master_ip_online_change_script= /usr/local/bin/master_ip_online_change  #手动切换需要执行的脚本,需要修改   
report_script=/usr/local/bin/send_report                                #切换时发送邮件进行报告,需要修改

2. 设置 relay log 的清除方式(在每个 slave 节点上):
(1)在 node2,node3 从节点上操作:
将 relay_log_purge= 0 加入 my.cnf 配置文件,前面已经配置。
(2) 设置定期清理 relay 脚本(node2,node3 上操作):
# cat purge_relay_log.sh   
#!/bin/bash   
user=root   
passwd=123456   
port=3306   
log_dir=’/data/masterha/log’   
work_dir=’/data’   
purge=’/usr/local/bin/purge_relay_logs’
if [! -d $log_dir] 
then   
  mkdir $log_dir -p   
fi
$purge –user=$user –password=$passwd –disable_relay_log_purge –port=$port –workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1
配置定时计划任务
# crontab -e 
0 4 * * * /bin/bash /root/purge_relay_log.sh
 
3. 要想把 keepalived 服务引入 MHA,我们只需要修改切换是触发的脚本文件 master_ip_failover 即可,在该脚本中添加在 master 发生宕机时对 keepalived 的处理。
(1) 编辑脚本 /usr/local/bin/master_ip_failover,修改后如下:
# vi /usr/local/bin/master_ip_failover   
#!/usr/bin/env perl   
use strict;   
use warnings FATAL => ‘all’;
use Getopt::Long;
my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,   
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port   
);
my $vip = ‘192.168.1.130’; 
my $ssh_start_vip = “/etc/init.d/keepalived start”;   
my $ssh_stop_vip = “/etc/init.d/keepalived stop”;
GetOptions(
    ‘command=s’          => \$command,   
    ‘ssh_user=s’        => \$ssh_user,   
    ‘orig_master_host=s’ => \$orig_master_host,   
    ‘orig_master_ip=s’  => \$orig_master_ip,   
    ‘orig_master_port=i’ => \$orig_master_port,   
    ‘new_master_host=s’  => \$new_master_host,   
    ‘new_master_ip=s’    => \$new_master_ip,   
    ‘new_master_port=i’  => \$new_master_port,   
);
exit &main(); 
sub main {
    print “\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n”;   
    if ($command eq “stop” || $command eq “stopssh”) {
        my $exit_code = 1;   
        eval {
            print “Disabling the VIP on old master: $orig_master_host \n”;   
            &stop_vip();   
            $exit_code = 0;   
        };   
        if ($@) {
            warn “Got Error: $@\n”;   
            exit $exit_code;   
        }   
        exit $exit_code;   
    }   
    elsif ($command eq “start”) {
        my $exit_code = 10; 
        eval {
            print “Enabling the VIP – $vip on the new master – $new_master_host \n”;   
            &start_vip();   
            $exit_code = 0;   
        };   
        if ($@) {
            warn $@;   
            exit $exit_code;   
        }   
        exit $exit_code;   
    }   
    elsif ($command eq “status”) {
        print “Checking the Status of the script.. OK \n”;   
        exit 0;   
    }   
    else {
        &usage();   
        exit 1;   
    }   
}   
sub start_vip() {   
    `ssh $ssh_user\@$new_master_host \” $ssh_start_vip \”`;   
}   
# A simple system call that disable the VIP on the old_master   
sub stop_vip() {   
    `ssh $ssh_user\@$orig_master_host \” $ssh_stop_vip \”`;   
}
sub usage {
    print   
    “Usage: master_ip_failover –command=start|stop|stopssh|status –orig_master_host=host –orig_master_ip=ip –orig_master_port=port –new_master_host=host –new_master_ip=ip –new_master_port=port\n”;   
}

(2) 编辑脚本 master_ip_online_change,修改后如下:
#!/usr/bin/env perl
#  Copyright (C) 2011 DeNA Co.,Ltd. 
#   
#  This program is free software; you can redistribute it and/or modify   
#  it under the terms of the GNU General Public License as published by   
#  the Free Software Foundation; either version 2 of the License, or   
#  (at your option) any later version.   
#   
#  This program is distributed in the hope that it will be useful,   
#  but WITHOUT ANY WARRANTY; without even the implied warranty of   
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the   
#  GNU General Public License for more details.   
#   
#  You should have received a copy of the GNU General Public License   
#  along with this program; if not, write to the Free Software   
#  Foundation, Inc.,   
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict; 
use warnings FATAL => ‘all’;
use Getopt::Long; 
use MHA::DBHelper;   
use MHA::NodeUtil;   
use Time::HiRes qw(sleep gettimeofday tv_interval);   
use Data::Dumper;
my $_tstart; 
my $_running_interval = 0.1;   
my (
  $command,              $orig_master_is_new_slave, $orig_master_host,   
  $orig_master_ip,      $orig_master_port,        $orig_master_user,   
  $orig_master_password, $orig_master_ssh_user,    $new_master_host,   
  $new_master_ip,        $new_master_port,          $new_master_user,   
  $new_master_password,  $new_master_ssh_user   
);   
my $vip = ‘192.168.1.130/24’;   
my $key = ‘1’;   
my $ssh_start_vip = “/sbin/ifconfig eth0:$key $vip”;   
my $ssh_stop_vip = “/sbin/ifconfig eth0:$key down”;   
my $orig_master_ssh_port = 22;   
my $new_master_ssh_port = 22;   
GetOptions(
  ‘command=s’                => \$command,   
  ‘orig_master_is_new_slave’ => \$orig_master_is_new_slave,   
  ‘orig_master_host=s’      => \$orig_master_host,   
  ‘orig_master_ip=s’        => \$orig_master_ip,   
  ‘orig_master_port=i’      => \$orig_master_port,   
  ‘orig_master_user=s’      => \$orig_master_user,   
  ‘orig_master_password=s’  => \$orig_master_password,   
  ‘orig_master_ssh_user=s’  => \$orig_master_ssh_user,   
  ‘new_master_host=s’        => \$new_master_host,   
  ‘new_master_ip=s’          => \$new_master_ip,   
  ‘new_master_port=i’        => \$new_master_port,   
  ‘new_master_user=s’        => \$new_master_user,   
  ‘new_master_password=s’    => \$new_master_password,   
  ‘new_master_ssh_user=s’    => \$new_master_ssh_user,   
  ‘orig_master_ssh_port=i’    => \$orig_master_ssh_port,   
  ‘new_master_ssh_port=i’    => \$new_master_ssh_port,   
);
exit &main();
sub current_time_us {
  my ($sec, $microsec) = gettimeofday();   
  my $curdate = localtime($sec);   
  return $curdate . ” ” . sprintf(“%06d”, $microsec);   
}
sub sleep_until {
  my $elapsed = tv_interval($_tstart);   
  if ($_running_interval > $elapsed) {
    sleep($_running_interval – $elapsed);   
  }   
}
sub get_threads_util {
  my $dbh                    = shift;   
  my $my_connection_id      = shift;   
  my $running_time_threshold = shift;   
  my $type                  = shift;   
  $running_time_threshold = 0 unless ($running_time_threshold);   
  $type                  = 0 unless ($type);   
  my @threads;
  my $sth = $dbh->prepare(“SHOW PROCESSLIST”); 
  $sth->execute();
  while (my $ref = $sth->fetchrow_hashref() ) {
    my $id        = $ref->{Id};   
    my $user      = $ref->{User};   
    my $host      = $ref->{Host};   
    my $command    = $ref->{Command};   
    my $state      = $ref->{State};   
    my $query_time = $ref->{Time};   
    my $info      = $ref->{Info};   
    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);   
    next if ($my_connection_id == $id);   
    next if (defined($query_time) && $query_time < $running_time_threshold );   
    next if (defined($command)    && $command eq “Binlog Dump” );   
    next if (defined($user)      && $user eq “system user” );   
    next   
      if (defined($command)   
      && $command eq “Sleep”   
      && defined($query_time)   
      && $query_time >= 1 );
    if ($type >= 1) {
      next if (defined($command) && $command eq “Sleep” );   
      next if (defined($command) && $command eq “Connect” );   
    }
    if ($type >= 2) {
      next if (defined($info) && $info =~ m/^select/i );   
      next if (defined($info) && $info =~ m/^show/i );   
    }
    push @threads, $ref; 
  }   
  return @threads;   
}
sub main {
  if ($command eq “stop”) {
    ## Gracefully killing connections on the current master   
    # 1. Set read_only= 1 on the new master   
    # 2. DROP USER so that no app user can establish new connections   
    # 3. Set read_only= 1 on the current master   
    # 4. Kill current queries   
    # * Any database access failure will result in script die.   
    my $exit_code = 1;   
    eval {
      ## Setting read_only=1 on the new master (to avoid accident)   
      my $new_master_handler = new MHA::DBHelper();
      # args: hostname, port, user, password, raise_error(die_on_error)_or_not 
      $new_master_handler->connect($new_master_ip, $new_master_port,   
        $new_master_user, $new_master_password, 1 );   
      print current_time_us() . ” Set read_only on the new master.. “;   
      $new_master_handler->enable_read_only();   
      if ($new_master_handler->is_read_only() ) {
        print “ok.\n”;   
      }   
      else {
        die “Failed!\n”;   
      }   
      $new_master_handler->disconnect();
      # Connecting to the orig master, die if any database error happens 
      my $orig_master_handler = new MHA::DBHelper();   
      $orig_master_handler->connect($orig_master_ip, $orig_master_port,   
        $orig_master_user, $orig_master_password, 1 );
      ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand 
      $orig_master_handler->disable_log_bin_local();   
      print current_time_us() . ” Drpping app user on the orig master..\n”;   
      #FIXME_xxx_drop_app_user($orig_master_handler);
      ## Waiting for N * 100 milliseconds so that current connections can exit 
      my $time_until_read_only = 15;   
      $_tstart = [gettimeofday];   
      my @threads = get_threads_util($orig_master_handler->{dbh},   
        $orig_master_handler->{connection_id} );   
      while ($time_until_read_only > 0 && $#threads >= 0) {
        if ($time_until_read_only % 5 == 0) {
          printf   
“%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n”,   
            current_time_us(), $#threads + 1, $time_until_read_only * 100;   
          if ($#threads < 5) {
            print Data::Dumper->new([$_] )->Indent(0)->Terse(1)->Dump . “\n”   
              foreach (@threads);   
          }   
        }   
        sleep_until();   
        $_tstart = [gettimeofday];   
        $time_until_read_only–;   
        @threads = get_threads_util($orig_master_handler->{dbh},   
          $orig_master_handler->{connection_id} );   
      }
      ## Setting read_only=1 on the current master so that nobody(except SUPER) can write 
      print current_time_us() . ” Set read_only=1 on the orig master.. “;   
      $orig_master_handler->enable_read_only();   
      if ($orig_master_handler->is_read_only() ) {
        print “ok.\n”;   
      }   
      else {
        die “Failed!\n”;   
      }
      ## Waiting for M * 100 milliseconds so that current update queries can complete 
      my $time_until_kill_threads = 5;   
      @threads = get_threads_util($orig_master_handler->{dbh},   
        $orig_master_handler->{connection_id} );   
      while ($time_until_kill_threads > 0 && $#threads >= 0) {
        if ($time_until_kill_threads % 5 == 0) {
          printf   
“%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n”,   
            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;   
          if ($#threads < 5) {
            print Data::Dumper->new([$_] )->Indent(0)->Terse(1)->Dump . “\n”   
              foreach (@threads);   
          }   
        }   
        sleep_until();   
        $_tstart = [gettimeofday];   
        $time_until_kill_threads–;   
        @threads = get_threads_util($orig_master_handler->{dbh},   
          $orig_master_handler->{connection_id} );   
      }
      ## Terminating all threads 
      print current_time_us() . ” Killing all application threads..\n”;   
      $orig_master_handler->kill_threads(@threads) if ($#threads >= 0);   
      print current_time_us() . ” done.\n”;   
      $orig_master_handler->enable_log_bin_local();   
      $orig_master_handler->disconnect();
      ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK 
      eval {
      `ssh -p$orig_master_ssh_port $orig_master_ssh_user\@$orig_master_host \” $ssh_stop_vip \”`;   
        };   
        if ($@) {
            warn $@;   
        }   
      $exit_code = 0;   
    };   
    if ($@) {
      warn “Got Error: $@\n”;   
      exit $exit_code;   
    }   
    exit $exit_code;   
  }   
  elsif ($command eq “start”) {
    ## Activating master ip on the new master   
    # 1. Create app user with write privileges   
    # 2. Moving backup script if needed   
    # 3. Register new master’s ip to the catalog database
# We don’t return error even though activating updatable accounts/ip failed so that we don’t interrupt slaves’ recovery. 
# If exit code is 0 or 10, MHA does not abort   
    my $exit_code = 10;   
    eval {
      my $new_master_handler = new MHA::DBHelper();
      # args: hostname, port, user, password, raise_error_or_not 
      $new_master_handler->connect($new_master_ip, $new_master_port,   
        $new_master_user, $new_master_password, 1 );
      ## Set read_only=0 on the new master 
      $new_master_handler->disable_log_bin_local();   
      print current_time_us() . ” Set read_only=0 on the new master.\n”;   
      $new_master_handler->disable_read_only();
      ## Creating an app user on the new master 
      print current_time_us() . ” Creating app user on the new master..\n”;   
      #FIXME_xxx_create_app_user($new_master_handler);   
      $new_master_handler->enable_log_bin_local();   
      $new_master_handler->disconnect();
      ## Update master ip on the catalog database, etc 
      `ssh -p$new_master_ssh_port $new_master_ssh_user\@$new_master_host \” $ssh_start_vip \”`;   
      $exit_code = 0;   
    };   
    if ($@) {
      warn “Got Error: $@\n”;   
      exit $exit_code;   
    }   
    exit $exit_code;   
  }   
  elsif ($command eq “status”) {
    # do nothing 
    exit 0;   
  }   
  else {
    &usage();   
    exit 1;   
  }   
}
sub usage {
  print   
“Usage: master_ip_online_change –command=start|stop|status –orig_master_host=host –orig_master_ip=ip –orig_master_port=port –new_master_host=host –new_master_ip=ip –new_master_port=port\n”;   
  die;   
}

(3) 编辑脚本 send_report,修改后如下:
#!/usr/bin/perl
use strict; 
use warnings FATAL => ‘all’;   
use Mail::Sender;   
use Getopt::Long;
#new_master_host and new_slave_hosts are set only when recovering master succeeded 
my ($dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body);   
my $smtp=’smtp.163.com’;   
my $mail_from=’xxxx’;   
my $mail_user=’xxxxx’;   
my $mail_pass=’xxxxx’;   
my $mail_to=[‘xxxx’,’xxxx’];   
GetOptions(
  ‘orig_master_host=s’ => \$dead_master_host,   
  ‘new_master_host=s’  => \$new_master_host,   
  ‘new_slave_hosts=s’  => \$new_slave_hosts,   
  ‘subject=s’          => \$subject,   
  ‘body=s’            => \$body,   
);
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
sub mailToContacts {
    my ($smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg) = @_;   
    open my $DEBUG, “> /tmp/monitormail.log”   
        or die “Can’t open the debug      file:$!\n”;   
    my $sender = new Mail::Sender {
        ctype      => ‘text/plain; charset=utf-8’,   
        encoding    => ‘utf-8’,   
        smtp        => $smtp,   
        from        => $mail_from,   
        auth        => ‘LOGIN’,   
        TLS_allowed => ‘0’,   
        authid      => $user,   
        authpwd    => $passwd,   
        to          => $mail_to,   
        subject    => $subject,   
        debug      => $DEBUG   
    };
    $sender->MailMsg(
        {msg  => $msg,   
            debug => $DEBUG   
        }   
    ) or print $Mail::Sender::Error;   
    return 1;   
}
# Do whatever you want here
exit 0;
 
五、MHA 的日常管理
1. 检查 SSH 配置(node5 Monitor 监控节点上操作),如下:
# masterha_check_ssh –conf=/etc/masterha/app1.cnf
Sun May  1 22:05:12 2016 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. 
Sun May  1 22:05:12 2016 – [info] Reading application default configuration from /etc/masterha/app1.cnf..   
Sun May  1 22:05:12 2016 – [info] Reading server configuration from /etc/masterha/app1.cnf..   
Sun May  1 22:05:12 2016 – [info] Starting SSH connection tests..   
Sun May  1 22:05:14 2016 – [debug]   
Sun May  1 22:05:12 2016 – [debug]  Connecting via SSH from root@192.168.1.121(192.168.1.121:22) to root@192.168.1.122(192.168.1.122:22)..   
Sun May  1 22:05:13 2016 – [debug]  ok.   
Sun May  1 22:05:13 2016 – [debug]  Connecting via SSH from root@192.168.1.121(192.168.1.121:22) to root@192.168.1.123(192.168.1.123:22)..   
Sun May  1 22:05:13 2016 – [debug]  ok.   
Sun May  1 22:05:14 2016 – [debug]   
Sun May  1 22:05:13 2016 – [debug]  Connecting via SSH from root@192.168.1.122(192.168.1.122:22) to root@192.168.1.121(192.168.1.121:22)..   
Sun May  1 22:05:13 2016 – [debug]  ok.   
Sun May  1 22:05:13 2016 – [debug]  Connecting via SSH from root@192.168.1.122(192.168.1.122:22) to root@192.168.1.123(192.168.1.123:22)..   
Sun May  1 22:05:14 2016 – [debug]  ok.   
Sun May  1 22:05:14 2016 – [debug]   
Sun May  1 22:05:13 2016 – [debug]  Connecting via SSH from root@192.168.1.123(192.168.1.123:22) to root@192.168.1.121(192.168.1.121:22)..   
Sun May  1 22:05:14 2016 – [debug]  ok.   
Sun May  1 22:05:14 2016 – [debug]  Connecting via SSH from root@192.168.1.123(192.168.1.123:22) to root@192.168.1.122(192.168.1.122:22)..   
Sun May  1 22:05:14 2016 – [debug]  ok.   
Sun May  1 22:05:14 2016 – [info] All SSH connection tests passed successfully.

2. 检查整个复制环境状况(node5 监控节点上操作),如下:
# masterha_check_repl –conf=/etc/masterha/app1.cnf
Sun May  1 22:46:44 2016 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. 
Sun May  1 22:46:44 2016 – [info] Reading application default configuration from /etc/masterha/app1.cnf..   
Sun May  1 22:46:44 2016 – [info] Reading server configuration from /etc/masterha/app1.cnf..   
Sun May  1 22:46:44 2016 – [info] MHA::MasterMonitor version 0.56.   
Sun May  1 22:46:45 2016 – [info] GTID failover mode = 0   
Sun May  1 22:46:45 2016 – [info] Dead Servers:   
Sun May  1 22:46:45 2016 – [info] Alive Servers:   
Sun May  1 22:46:45 2016 – [info]  192.168.1.121(192.168.1.121:3306)   
Sun May  1 22:46:45 2016 – [info]  192.168.1.122(192.168.1.122:3306)   
Sun May  1 22:46:45 2016 – [info]  192.168.1.123(192.168.1.123:3306)   
Sun May  1 22:46:45 2016 – [info] Alive Slaves:   
Sun May  1 22:46:45 2016 – [info]  192.168.1.122(192.168.1.122:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled   
Sun May  1 22:46:45 2016 – [info]    Replicating from 192.168.1.121(192.168.1.121:3306)   
Sun May  1 22:46:45 2016 – [info]    Primary candidate for the new Master (candidate_master is set)   
Sun May  1 22:46:45 2016 – [info]  192.168.1.123(192.168.1.123:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled   
Sun May  1 22:46:45 2016 – [info]    Replicating from 192.168.1.121(192.168.1.121:3306)   
Sun May  1 22:46:45 2016 – [info] Current Alive Master: 192.168.1.121(192.168.1.121:3306)   
Sun May  1 22:46:45 2016 – [info] Checking slave configurations..   
Sun May  1 22:46:45 2016 – [info]  read_only=1 is not set on slave 192.168.1.122(192.168.1.122:3306).   
Sun May  1 22:46:45 2016 – [warning]  relay_log_purge=0 is not set on slave 192.168.1.122(192.168.1.122:3306).   
Sun May  1 22:46:45 2016 – [info]  read_only=1 is not set on slave 192.168.1.123(192.168.1.123:3306).   
Sun May  1 22:46:45 2016 – [warning]  relay_log_purge=0 is not set on slave 192.168.1.123(192.168.1.123:3306).   
Sun May  1 22:46:45 2016 – [info] Checking replication filtering settings..   
Sun May  1 22:46:45 2016 – [info]  binlog_do_db= , binlog_ignore_db=   
Sun May  1 22:46:45 2016 – [info]  Replication filtering check ok.   
Sun May  1 22:46:45 2016 – [info] GTID (with auto-pos) is not supported   
Sun May  1 22:46:45 2016 – [info] Starting SSH connection tests..   
Sun May  1 22:46:46 2016 – [info] All SSH connection tests passed successfully.   
Sun May  1 22:46:46 2016 – [info] Checking MHA Node version..   
Sun May  1 22:46:47 2016 – [info]  Version check ok.   
Sun May  1 22:46:47 2016 – [info] Checking SSH publickey authentication settings on the current master..   
Sun May  1 22:46:47 2016 – [info] HealthCheck: SSH to 192.168.1.121 is reachable.   
Sun May  1 22:46:47 2016 – [info] Master MHA Node version is 0.56.   
Sun May  1 22:46:47 2016 – [info] Checking recovery script configurations on 192.168.1.121(192.168.1.121:3306)..   
Sun May  1 22:46:47 2016 – [info]  Executing command: save_binary_logs –command=test –start_pos=4 –binlog_dir=/usr/local/mysql/data/ –output_file=/tmp/save_binary_logs_test –manager_version=0.56 –start_file=master-bin.000008   
Sun May  1 22:46:47 2016 – [info]  Connecting to root@192.168.1.121(192.168.1.121:22)..   
  Creating /tmp if not exists..    ok.   
  Checking output directory is accessible or not..   
  ok.   
  Binlog found at /usr/local/mysql/data/, up to master-bin.000008   
Sun May  1 22:46:48 2016 – [info] Binlog setting check done.   
Sun May  1 22:46:48 2016 – [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..   
Sun May  1 22:46:48 2016 – [info]  Executing command : apply_diff_relay_logs –command=test –slave_user=’root’ –slave_host=192.168.1.122 –slave_ip=192.168.1.122 –slave_port=3306 –workdir=/tmp –target_version=5.6.29-log –manager_version=0.56 –relay_log_info=/usr/local/mysql/data/relay-log.info  –relay_dir=/usr/local/mysql/data/  –slave_pass=xxx   
Sun May  1 22:46:48 2016 – [info]  Connecting to root@192.168.1.122(192.168.1.122:22)..   
  Checking slave recovery environment settings..   
    Opening /usr/local/mysql/data/relay-log.info … ok.   
    Relay log found at /usr/local/mysql/data, up to node2-relay-bin.000002   
    Temporary relay log file is /usr/local/mysql/data/node2-relay-bin.000002   
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.   
done.   
    Testing mysqlbinlog output.. done.   
    Cleaning up test file(s).. done.   
Sun May  1 22:46:48 2016 – [info]  Executing command : apply_diff_relay_logs –command=test –slave_user=’root’ –slave_host=192.168.1.123 –slave_ip=192.168.1.123 –slave_port=3306 –workdir=/tmp –target_version=5.6.29-log –manager_version=0.56 –relay_log_info=/usr/local/mysql/data/relay-log.info  –relay_dir=/usr/local/mysql/data/  –slave_pass=xxx   
Sun May  1 22:46:48 2016 – [info]  Connecting to root@192.168.1.123(192.168.1.123:22)..   
  Checking slave recovery environment settings..   
    Opening /usr/local/mysql/data/relay-log.info … ok.   
    Relay log found at /usr/local/mysql/data, up to slave-relay-bin.000012   
    Temporary relay log file is /usr/local/mysql/data/slave-relay-bin.000012   
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.   
done.   
    Testing mysqlbinlog output.. done.   
    Cleaning up test file(s).. done.   
Sun May  1 22:46:48 2016 – [info] Slaves settings check done.   
Sun May  1 22:46:48 2016 – [info]   
192.168.1.121(192.168.1.121:3306) (current master)   
+–192.168.1.122(192.168.1.122:3306)   
+–192.168.1.123(192.168.1.123:3306)
Sun May  1 22:46:48 2016 – [info] Checking replication health on 192.168.1.122.. 
Sun May  1 22:46:48 2016 – [info]  ok.   
Sun May  1 22:46:48 2016 – [info] Checking replication health on 192.168.1.123..   
Sun May  1 22:46:48 2016 – [info]  ok.   
Sun May  1 22:46:48 2016 – [info] Checking master_ip_failover_script status:   
Sun May  1 22:46:48 2016 – [info]  /usr/local/bin/master_ip_failover –command=status –ssh_user=root –orig_master_host=192.168.1.121 –orig_master_ip=192.168.1.121 –orig_master_port=3306
IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===
Checking the Status of the script.. OK   
Sun May  1 22:46:48 2016 – [info]  OK.   
Sun May  1 22:46:48 2016 – [warning] shutdown_script is not defined.   
Sun May  1 22:46:48 2016 – [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK. 
[root@node5 masterha]#

3. 开启 MHA Manager 监控 (node5 操作) 如下:
# mkdir -p  /var/log/masterha/app1/ 
# nohup masterha_manager –conf=/etc/masterha/app1.cnf –remove_dead_master_conf –ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
参数说明:
–remove_dead_master_conf    #该参数代表当发生主从切换后,老的主库的 ip 将会从配置文件中移除。
–manger_log                  #日志存放位置   
–ignore_last_failover        #在缺省情况下,如果 MHA 检测到连续发生宕机,会生成 app1.failover.complete 文件,会造成 MHA 管理进程无法启动。

4. 查看 MHA Manager 监控是否正常:
# masterha_check_status –conf=/etc/masterha/app1.cnf 
app1 (pid:2480) is running(0:PING_OK), master:192.168.1.121

5. 查看启动日志 (node5 操作) 如下:
# tail -n20 /var/log/masterha/app1/manager.log

6. 关闭 MHA Manage 监控:
(1) 关闭
# masterha_stop –conf=/etc/masterha/app1.cnf
(2) 启动
# nohup masterha_manager –conf=/etc/masterha/app1.cnf –remove_dead_master_conf –ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1

六、MHA Failover 切换
1. 自动 Failover 切换
(1) 模拟 master mysql 关闭 
(2) VIP 将会切换到 node2   
(3) /etc/masterha/app1.cnf 中将原主服务器配置文件清掉。
(4) masterha_manager 监控进程会自动退出关闭,并在 /var/log/masterha/app1 下生成 app1.failover.complete 文件,manager.log 会记录全过程,从服务器会自动从新的主服务器复制。
(5) 原主服务器 mysqld 启动的,需要清掉 /var/log/masterha/app1 下生成 app1.failover.complete 文件,添加 node1 配置文件到 /etc/masterha/app1.cnf,通过 manager.log 中的记录的故障点,重新同步主服务器,成为从节点。

2. 手动 Failover 切换
(1) 先停 MHA Manager 进程。
masterha_stop –conf=/etc/masterha/app1.cnf   
(2) 停掉 master mysqld   
(3) 手动切换,在 Manager 主机上操作如下:
# masterha_master_switch –master_state=dead –conf=/etc/masterha/app1.cnf –dead_master_host=192.168.1.122 –dead_master_port=3306 –new_master_host=192.168.1.121 –new_master_port=3306 –ignore_last_failover
通过观察日志可以观察切换全过程。
(4) 如上节方式恢复节点为从服务器。
3.  正常运行情况下切换 (Master 正在运行)
等补充。
4. 小结
通过对 MMM,MHA 的环境搭建测试,MHA 由于采用复制架构,原理简单,在一些对数据要求比较高的环境,为了保证可靠性,最好与半同步结合使用。

本文永久更新链接地址:http://www.linuxidc.com/Linux/2016-05/130925.htm

正文完
星哥说事-微信公众号
post-qrcode
 0
星锅
版权声明:本站原创文章,由 星锅 于2022-01-22发表,共计28672字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
【腾讯云】推广者专属福利,新客户无门槛领取总价值高达2860元代金券,每种代金券限量500张,先到先得。
阿里云-最新活动爆款每日限量供应
评论(没有评论)
验证码
【腾讯云】云服务器、云数据库、COS、CDN、短信等云产品特惠热卖中