阿里云-云小站(无限量代金券发放中)
【腾讯云】云服务器、云数据库、COS、CDN、短信等热卖云产品特惠抢购

MySQL高可用方案MHA的部署和原理

136次阅读
没有评论

共计 41448 个字符,预计需要花费 104 分钟才能阅读完成。

MHA(Master High Availability)是一套相对成熟的 MySQL 高可用方案,能做到在 0~30s 内自动完成数据库的故障切换操作,在 master 服务器不宕机的情况下,基本能保证数据的一致性。

它由两部分组成:MHA Manager(管理节点)和 MHA Node(数据节点)。其中,MHA Manager 可以单独部署在一台独立的机器上管理多个 master-slave 集群,也可以部署在一台 slave 上。MHA Node 则运行在每个 mysql 节点上,MHA Manager 会定时探测集群中的 master 节点,当 master 出现故障时,它自动将最新数据的 slave 提升为 master,然后将其它所有的 slave 指向新的 master。

在 MHA 自动故障切换过程中,MHA 试图保存 master 的二进制日志,从而最大程度地保证数据不丢失,当这并不总是可行的,譬如,主服务器硬件故障或无法通过 ssh 访问,MHA 就没法保存二进制日志,这样就只进行了故障转移但丢失了最新数据。可结合 MySQL 5.5 中推出的半同步复制来降低数据丢失的风险。

MHA 软件由两部分组成:Manager 工具包和 Node 工具包,具体说明如下:

MHA Manager:

1. masterha_check_ssh:检查 MHA 的 SSH 配置状况

2. masterha_check_repl:检查 MySQL 的复制状况

3. masterha_manager:启动 MHA

4. masterha_check_status:检测当前 MHA 运行状态

5. masterha_master_monitor:检测 master 是否宕机

6. masterha_master_switch:控制故障转移(自动或手动)

7. masterha_conf_host:添加或删除配置的 server 信息

8. masterha_stop:关闭 MHA

MHA Node:

save_binary_logs:保存或复制 master 的二进制日志

apply_diff_relay_logs:识别差异的 relay log 并将差异的 event 应用到其它 slave 中

filter_mysqlbinlog:去除不必要的 ROLLBACK 事件(MHA 已不再使用这个工具)

purge_relay_logs:消除中继日志(不会堵塞 SQL 线程)

另有如下几个脚本需自定义:

1. master_ip_failover:管理 VIP

2. master_ip_online_change:

3. masterha_secondary_check:当 MHA manager 检测到 master 不可用时,通过 masterha_secondary_check 脚本来进一步确认,减低误切的风险。

4. send_report:当发生故障切换时,可通过 send_report 脚本发送告警信息。

集群信息

角色                             IP 地址                 ServerID      类型

Master                         192.168.244.10   1                 写入

Candicate master          192.168.244.20   2                 读

Slave                           192.168.244.30   3                 读

Monitor host                 192.168.244.40                      监控集群组

注:操作系统均为 RHEL 6.7

其中,master 对外提供写服务,备选 master 提供读服务,slave 也提供相关的读服务,一旦 master 宕机,将会把备选 master 提升为新的 master,slave 指向新的 master

一、在所有节点上安装 MHA node

    1. 在 MySQL 服务器上安装 MHA node 所需的 perl 模块(DBD:mysql)

      # yum install perl-DBD-MySQL -y

    2. 在所有的节点上安装 mha node

      下载地址为:https://code.google.com/p/mysql-master-ha/wiki/Downloads?tm=2

      由于该网址在国内被墙,相关文件下载后,放到了个人网盘中,http://pan.baidu.com/s/1boS31vT,有需要的童鞋可自行下载。

      # tar xvf mha4mysql-node-0.56.tar.gz

      # cd mha4mysql-node-0.56

      # perl Makefile.PL 

Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Can.pm line 6.
BEGIN failed--compilation aborted at inc/Module/Install/Can.pm line 6.
Compilation failed in require at inc/Module/Install.pm line 283.
Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Makefile.pm line 4.
BEGIN failed--compilation aborted at inc/Module/Install/Makefile.pm line 4.
Compilation failed in require at inc/Module/Install.pm line 283.
Can't locate ExtUtils/MM_Unix.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Metadata.pm line 349.

    通过报错可以看出,是相关依赖包没有安装。

     # yum install perl-ExtUtils-MakeMaker -y

     # perl Makefile.PL 

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
Can't locate CPAN.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/AutoInstall.pm line 277.

    # yum install perl-CPAN -y

    # perl Makefile.PL

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI        ...loaded. (1.609)
- DBD::mysql ...loaded. (4.013)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::node

     # make 

     # make install

    至此,MHA node 节点安装完毕,会在 /usr/local/bin 下生成以下脚本文件

# ll /usr/local/bin/
total 44
-r-xr-xr-x 1 root root 16367 Jul 20 07:00 apply_diff_relay_logs
-r-xr-xr-x 1 root root  4807 Jul 20 07:00 filter_mysqlbinlog
-r-xr-xr-x 1 root root  8261 Jul 20 07:00 purge_relay_logs
-r-xr-xr-x 1 root root  7525 Jul 20 07:00 save_binary_logs

二、在 Monitor host 节点上部署 MHA Manager

     # tar xvf mha4mysql-manager-0.56.tar.gz 

     # cd mha4mysql-manager-0.56

     # perl Makefile.PL

 

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies…
[Core Features]
– DBI                  …loaded. (1.609)
– DBD::mysql            …loaded. (4.013)
Time::HiRes          …missing.
Config::Tiny          …missing.
Log::Dispatch        …missing.
Parallel::ForkManager …missing.
MHA::NodeConst        …missing.
==> Auto-install the 5 mandatory module(s) from CPAN? [y] y
*** Dependencies will be installed the next time you type make.
*** Module::AutoInstall configuration finished.
Checking
if your kit is complete…
Looks good
Warning: prerequisite Config::Tiny
0 not found.
Warning: prerequisite Log::Dispatch
0 not found.
Warning: prerequisite MHA::NodeConst
0 not found.
Warning: prerequisite Parallel::ForkManager
0 not found.
Warning: prerequisite Time::HiRes
0 not found.
Writing Makefile
for mha4mysql::manager

     # make

     # make install

    执行完毕后,会在 /usr/local/bin 下新增以下几个文件  

# ll /usr/local/bin/
total
40
-r-xr-xr-x 1 root root 1991 Jul 20 00:50 masterha_check_repl-r-xr-xr-x 1 root root 1775 Jul 20 00:50 masterha_check_ssh
-r-xr-xr-x 1 root root 1861 Jul 20 00:50 masterha_check_status
-r-xr-xr-x 1 root root 3197 Jul 20 00:50 masterha_conf_host
-r-xr-xr-x 1 root root 2513 Jul 20 00:50 masterha_manager
-r-xr-xr-x 1 root root 2161 Jul 20 00:50 masterha_master_monitor
-r-xr-xr-x 1 root root 2369 Jul 20 00:50 masterha_master_switch
-r-xr-xr-x 1 root root 5167 Jul 20 00:50 masterha_secondary_check-r-xr-xr-x 1 root root 1735 Jul 20 00:50 masterha_stop 

三、配置 SSH 登录无密码验证

    1. 在 manager 上配置到所有 Node 节点的无密码验证

      # ssh-keygen

      一路按“Enter”

     # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.10

     # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.20

     # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.30

    2. 在 Master(192.168.244.10)上配置

    # ssh-keygen

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.20

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.30

    3. 在 Candicate master(192.168.244.20)上配置     

    # ssh-keygen

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.10

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.30

     4. 在 Slave(192.168.244.30)上配置     

    # ssh-keygen

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.10

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.20

四、搭建主从复制环境

     1. 在 Master 上执行备份

     # mysqldump –master-data=2 –single-transaction -R –triggers -A > all.sql

     其中,- R 是备份存储过程,–triggers 是备份触发器 - A 代表全库

     2. 在 Master 上创建复制用户

mysql> grant replication slave on *.* to 'repl'@'192.168.244.%' identified by 'repl';
Query OK, 0 rows affected (0.09 sec)

    3. 查看备份文件 all.sql 中的 CHANGE MASTER 语句

      # head -n 30 all.sql

-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=120;

     4. 将备份文件复制到 Candicate master 和 Slave 上

     # scp all.sql 192.168.244.20:/root/

     # scp all.sql 192.168.244.30:/root/

     5. 在 Candicate master 上搭建从库

     # mysql < all.sql 

     设置复制信息

mysql> CHANGE MASTER TO
    -> MASTER_HOST=192.168.244.10,
   
-> MASTER_USER=repl,
   
-> MASTER_PASSWORD=repl,
   
-> MASTER_LOG_FILE=mysql-bin.000002,
   
-> MASTER_LOG_POS=120;
Query OK,
0 rows affected, 2 warnings (0.19 sec)

mysql> start slave;
Query OK,
0 rows affected (0.02 sec)

mysql> show slave status\G

       6. 在 Slave 上搭建从库

       7. slave 服务器设置为 read only

mysql> set global read_only=1;
Query OK, 0 rows affected (0.04 sec)

       8. 在 Master 中创建监控用户

mysql> grant all privileges on *.* to 'monitor'@'%' identified by 'monitor123';
Query OK, 0 rows affected (0.07 sec)

五、配置 MHA

     1. 在 Monitor host(192.168.244.40)上创建 MHA 工作目录,并且创建相关配置文件

     # mkdir -p /etc/masterha

     # vim /etc/masterha/app1.cnf

[server default]
manager_log=/masterha/app1/manager.log          // 设置 manager 的日志 
manager_workdir=/masterha/app1           // 设置 manager 的工作目录 
master_binlog_dir=/var/lib/mysql                  // 设置 master 默认保存 binlog 的位置,以便 MHA 可以找到 master 的日志 
master_ip_failover_script= /usr/local/bin/master_ip_failover    // 设置自动 failover 时候的切换脚本 
master_ip_online_change_script= /usr/local/bin/master_ip_online_change  // 设置手动切换时候的切换脚本 
user=monitor               // 设置监控用户 
password=monitor123         // 设置监控用户的密码 
ping_interval=1         // 设置监控主库,发送 ping 包的时间间隔,默认是 3 秒,尝试三次没有回应的时候进行自动 failover
remote_workdir=/tmp     // 设置远端 mysql 在发生切换时 binlog 的保存位置 
repl_user=repl          // 设置复制环境中的复制用户名 
repl_password=repl    // 设置复制用户的密码 
report_script=/usr/local/bin/send_report    // 设置发生切换后发送的报警的脚本 
secondary_check_script= /usr/local/bin/masterha_secondary_check -s 192.168.244.20 -s 192.168.244.30 --user=root --master_host=192.168.244.10 --master_ip=192.168.244.10 --master_port=3306  // 一旦 MHA 到 master 的监控之间出现问题,MHA Manager 将会判断其它两个 slave 是否能建立到 master_ip 3306 端口的连接 
shutdown_script=""      // 设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机防止发生脑裂)
ssh_user=root           // 设置 ssh 的登录用户名 

[server1]
hostname=192.168.244.10
port=3306

[server2]
hostname=192.168.244.20
port=3306
candidate_master=1   // 设置为候选 master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中最新的 slave
check_repl_delay=0   // 默���情况下如果一个 slave 落后 master 100M 的 relay logs 的话,MHA 将不会选择该 slave 作为一个新的 master,因为对于这个 slave 的恢复需要花费很长时间,通过设置 check_repl_delay=0,MHA 触发切换在选择一个新的 master 的时候将会忽略复制延时,这个参数对于设置了 candidate_master= 1 的主机非常有用,因为它保证了这个候选主在切换过程中一定是最新的 master

[server3]
hostname=192.168.244.30
port=3306

      注意:

      1> 在编辑该文件时,后面的注释切记要去掉,MHA 并不会将后面的内容识别为注释。

      2> 配置文件中设置了 master_ip_failover_script 和 secondary_check_script 两个选项,对应的文件在上面提供的百度云盘中有。

      2. 设置 relay log 清除方式(在每个 Slave 上)

mysql> set global relay_log_purge=0;
Query OK, 0 rows affected (0.00 sec)

      MHA 在发生切换过程中,从库在恢复的过程中,依赖于 relay log 的相关信息,所以我们这里要将 relay log 的自动清楚设置为 OFF,采用手动清楚 relay log 的方式。

      在默认情况下,从服务器上的中继日志会在 SQL 线程执行完后被自动删除。但是在 MHA 环境中,这些中继日志在恢复其它从服务器时可能会被用到,因此需要禁用中继日志的自动清除。改为定期手动清除 SQL 线程应用完的中继日志。

      在 ext3 文件系统下,删除大的文件需要一定的时间,这样会导致严重的复制延迟,所以在 Linux 中,一般都是通过硬链接的方式来删除大文件。

      3. 设置定期清理 relay 脚本

        MHA 节点中包含了 purge_relay_logs 脚本,它可以为 relay log 创建硬链接,执行 set global relay_log_purge=1,等待几秒钟以便 SQL 线程切换到新的中继日志,再执行 set global relay_log_purge=0。

        下面看看脚本的使用方法:

        # purge_relay_logs –user=monitor –password=monitor123 -disable_relay_log_purge –workdir=/tmp/

20170424 20:27:46: purge_relay_logs script started.
Found relay_log.
info: /var/lib/mysql/relay-log.info
Opening
/var/lib/mysql/mysqld-relay-bin.000001 ..
Opening
/var/lib/mysql/mysqld-relay-bin.000002 ..
Opening
/var/lib/mysql/mysqld-relay-bin.000003 ..
Opening
/var/lib/mysql/mysqld-relay-bin.000004 ..
Opening
/var/lib/mysql/mysqld-relay-bin.000005 ..
Opening
/var/lib/mysql/mysqld-relay-bin.000006 ..
Executing SET GLOBAL relay_log_purge
=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if i
t keeps up); SET GLOBAL relay_log_purge
=0; .. ok.20170424 20:27:50: All relay log purging operations succeeded.

        其中,

        –user:mysql 用户名

        –password:mysql 用户的密码

        –host:mysqlserver 地址

        –workdir:指定创建 relay log 的硬链接的位置,默认的是 /var/tmp。由于系统不同分区创建硬链接文件会失败,故需要指定具体的硬链接的位置。

        –disable_relay_log_purge:默认情况下,如果 relay_log_purge=1,则脚本会直接退出。通过设置这个参数,该脚本会首先将 relay_log_purge 设置为 1,清除掉 relay log 后,再将该参数设置为 0。

        设置 crontab 来定期清理 relay log

        MHA 在切换的过程中会直接调用 mysqlbinlog 命令,故需要在环境变量中指定 mysqlbinlog 的具体路径。

        # vim /etc/cron.d/purge_relay_logs

0 4 * * * /usr/local/bin/purge_relay_logs --user=monitor --password=monitor123 -disable_relay_log_purge --workdir=/tmp/ >> /tmp/purge
_relay_logs.log 2>&1

      注意:最好是每台 slave 服务器在不同时间点执行该计划任务。

      4. 将 mysqlbinlog 的路径添加到环境变量中

更多详情见请继续阅读下一页的精彩内容 :http://www.linuxidc.com/Linux/2017-05/144086p2.htm

六、检查 SSH 的配置

       在 Monitor host 上执行

       # masterha_check_ssh –conf=/etc/masterha/app1.cnf

Wed Jul 20 14:33:36 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jul 20 14:33:36 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Jul 20 14:33:36 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Jul 20 14:33:36 2016 - [info] Starting SSH connection tests..
Wed Jul 20 14:33:51 2016 - [debug] 
Wed Jul 20 14:33:36 2016 - [debug]  Connecting via SSH from root@192.168.244.10(192.168.244.10:22) to root@192.168.244.20(192.168.244.20:22)..
Wed Jul 20 14:33:48 2016 - [debug]   ok.
Wed Jul 20 14:33:48 2016 - [debug]  Connecting via SSH from root@192.168.244.10(192.168.244.10:22) to root@192.168.244.30(192.168.244.30:22)..
Wed Jul 20 14:33:50 2016 - [debug]   ok.
Wed Jul 20 14:33:55 2016 - [debug] 
Wed Jul 20 14:33:37 2016 - [debug]  Connecting via SSH from root@192.168.244.30(192.168.244.30:22) to root@192.168.244.10(192.168.244.10:22)..
Wed Jul 20 14:33:49 2016 - [debug]   ok.
Wed Jul 20 14:33:49 2016 - [debug]  Connecting via SSH from root@192.168.244.30(192.168.244.30:22) to root@192.168.244.20(192.168.244.20:22)..
Wed Jul 20 14:33:54 2016 - [debug]   ok.
Wed Jul 20 14:33:55 2016 - [debug] 
Wed Jul 20 14:33:36 2016 - [debug]  Connecting via SSH from root@192.168.244.20(192.168.244.20:22) to root@192.168.244.10(192.168.244.10:22)..
Wed Jul 20 14:33:49 2016 - [debug]   ok.
Wed Jul 20 14:33:49 2016 - [debug]  Connecting via SSH from root@192.168.244.20(192.168.244.20:22) to root@192.168.244.30(192.168.244.30:22)..
Wed Jul 20 14:33:54 2016 - [debug]   ok.
Wed Jul 20 14:33:55 2016 - [info] All SSH connection tests passed successfully 

七、查看整个集群的状态

     在 Monitor host 上执行

     # masterha_check_repl –conf=/etc/masterha/app1.cnf

Wed Jul 20 14:44:30 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jul 20 14:44:30 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Jul 20 14:44:30 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Jul 20 14:44:30 2016 - [info] MHA::MasterMonitor version 0.56.
Wed Jul 20 14:44:31 2016 - [info] GTID failover mode = 0
Wed Jul 20 14:44:31 2016 - [info] Dead Servers:
Wed Jul 20 14:44:31 2016 - [info] Alive Servers:
Wed Jul 20 14:44:31 2016 - [info]   192.168.244.10(192.168.244.10:3306)
Wed Jul 20 14:44:31 2016 - [info]   192.168.244.20(192.168.244.20:3306)
Wed Jul 20 14:44:31 2016 - [info]   192.168.244.30(192.168.244.30:3306)
Wed Jul 20 14:44:31 2016 - [info] Alive Slaves:
Wed Jul 20 14:44:31 2016 - [info]   192.168.244.20(192.168.244.20:3306)  Version=5.6.31 (oldest major version between slaves) log-bin:disabled
Wed Jul 20 14:44:31 2016 - [info]     Replicating from 192.168.244.10(192.168.244.10:3306)
Wed Jul 20 14:44:31 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Jul 20 14:44:31 2016 - [info]   192.168.244.30(192.168.244.30:3306)  Version=5.6.31 (oldest major version between slaves) log-bin:disabled
Wed Jul 20 14:44:31 2016 - [info]     Replicating from 192.168.244.10(192.168.244.10:3306)
Wed Jul 20 14:44:31 2016 - [info] Current Alive Master: 192.168.244.10(192.168.244.10:3306)
Wed Jul 20 14:44:31 2016 - [info] Checking slave configurations..
Wed Jul 20 14:44:31 2016 - [warning]  log-bin is not set on slave 192.168.244.20(192.168.244.20:3306). This host cannot be a master.
Wed Jul 20 14:44:31 2016 - [warning]  log-bin is not set on slave 192.168.244.30(192.168.244.30:3306). This host cannot be a master.
Wed Jul 20 14:44:31 2016 - [info] Checking replication filtering settings..
Wed Jul 20 14:44:31 2016 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Jul 20 14:44:31 2016 - [info]  Replication filtering check ok.
Wed Jul 20 14:44:31 2016 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln361] None of slaves can be master. Check failover configuration file or log-bin settings in my.cnf
Wed Jul 20 14:44:31 2016 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations.  at /usr/local/bin/masterha_check_repl line 48.
Wed Jul 20 14:44:31 2016 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Wed Jul 20 14:44:31 2016 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

View Code

    报错很明显,Candicate master 和 Slave 都没有启动 log-bin,如果没有启动的话,后续就无法提升为主

    设置 log-bin 后,重新执行:

Wed Jul 20 15:49:58 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jul 20 15:49:58 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Jul 20 15:49:58 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Jul 20 15:49:58 2016 - [info] MHA::MasterMonitor version 0.56.
Wed Jul 20 15:49:59 2016 - [info] GTID failover mode = 0
Wed Jul 20 15:49:59 2016 - [info] Dead Servers:
Wed Jul 20 15:49:59 2016 - [info] Alive Servers:
Wed Jul 20 15:49:59 2016 - [info]   192.168.244.10(192.168.244.10:3306)
Wed Jul 20 15:49:59 2016 - [info]   192.168.244.20(192.168.244.20:3306)
Wed Jul 20 15:49:59 2016 - [info]   192.168.244.30(192.168.244.30:3306)
Wed Jul 20 15:49:59 2016 - [info] Alive Slaves:
Wed Jul 20 15:49:59 2016 - [info]   192.168.244.20(192.168.244.20:3306)  Version=5.6.31-log (oldest major version between slaves) log-bin:enabled
Wed Jul 20 15:49:59 2016 - [info]     Replicating from 192.168.244.10(192.168.244.10:3306)
Wed Jul 20 15:49:59 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Jul 20 15:49:59 2016 - [info]   192.168.244.30(192.168.244.30:3306)  Version=5.6.31-log (oldest major version between slaves) log-bin:enabled
Wed Jul 20 15:49:59 2016 - [info]     Replicating from 192.168.244.10(192.168.244.10:3306)
Wed Jul 20 15:49:59 2016 - [info] Current Alive Master: 192.168.244.10(192.168.244.10:3306)
Wed Jul 20 15:49:59 2016 - [info] Checking slave configurations..
Wed Jul 20 15:49:59 2016 - [info] Checking replication filtering settings..
Wed Jul 20 15:49:59 2016 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Jul 20 15:49:59 2016 - [info]  Replication filtering check ok.
Wed Jul 20 15:49:59 2016 - [info] GTID (with auto-pos) is not supported
Wed Jul 20 15:49:59 2016 - [info] Starting SSH connection tests..
Wed Jul 20 15:50:17 2016 - [info] All SSH connection tests passed successfully.
Wed Jul 20 15:50:17 2016 - [info] Checking MHA Node version..
Wed Jul 20 15:50:18 2016 - [info]  Version check ok.
Wed Jul 20 15:50:18 2016 - [info] Checking SSH publickey authentication settings on the current master..
Wed Jul 20 15:50:20 2016 - [info] HealthCheck: SSH to 192.168.244.10 is reachable.
Wed Jul 20 15:50:21 2016 - [info] Master MHA Node version is 0.56.
Wed Jul 20 15:50:21 2016 - [info] Checking recovery script configurations on 192.168.244.10(192.168.244.10:3306)..
Wed Jul 20 15:50:21 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysqld-bin.000002 
Wed Jul 20 15:50:21 2016 - [info]   Connecting to root@192.168.244.10(192.168.244.10:22).. 
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysqld-bin.000002
Wed Jul 20 15:50:23 2016 - [info] Binlog setting check done.
Wed Jul 20 15:50:23 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Jul 20 15:50:23 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.244.20 --slave_ip=192.168.244.20 --slave_port=3306 --workdir=/tmp --target_version=5.6.31-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Wed Jul 20 15:50:23 2016 - [info]   Connecting to root@192.168.244.20(192.168.244.20:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000004
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000004
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jul 20 15:50:28 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.244.30 --slave_ip=192.168.244.30 --slave_port=3306 --workdir=/tmp --target_version=5.6.31-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Wed Jul 20 15:50:28 2016 - [info]   Connecting to root@192.168.244.30(192.168.244.30:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000008
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000008
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jul 20 15:50:32 2016 - [info] Slaves settings check done.
Wed Jul 20 15:50:32 2016 - [info] 
192.168.244.10(192.168.244.10:3306) (current master)
 +--192.168.244.20(192.168.244.20:3306)
 +--192.168.244.30(192.168.244.30:3306)

Wed Jul 20 15:50:32 2016 - [info] Checking replication health on 192.168.244.20..
Wed Jul 20 15:50:32 2016 - [info]  ok.
Wed Jul 20 15:50:32 2016 - [info] Checking replication health on 192.168.244.30..
Wed Jul 20 15:50:32 2016 - [info]  ok.
Wed Jul 20 15:50:32 2016 - [info] Checking master_ip_failover_script status:
Wed Jul 20 15:50:32 2016 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.244.10 --orig_master_ip=192.168.244.10 --orig_master_port=3306 
Wed Jul 20 15:50:32 2016 - [info]  OK.
Wed Jul 20 15:50:32 2016 - [warning] shutdown_script is not defined.
Wed Jul 20 15:50:32 2016 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

View Code

    检查通过~

八、检查 MHA Manager 的状态

# masterha_check_status --conf=/etc/masterha/app1.cnf 
app1 is stopped(2:NOT_RUNNING). 

        如果正常,会显示“PING_OK”,否则会显示“NOT_RUNNING”,代表 MHA 监控还没有开启。

九、开启 MHA Manager 监控

      # nohup masterha_manager –conf=/etc/masterha/app1.cnf –remove_dead_master_conf –ignore_last_failover < /dev/null > /masterha/app1/manager.log 2>&1 &

      其中,

      remove_dead_master_conf:该参数代表当发生主从切换后,老的主库的 IP 将会从配置文件中移除。

      ignore_last_failover:在默认情况下,MHA 发生切换后将会在 /masterha/app1 下产生 app1.failover.complete 文件,下次再次切换的时候如果发现该目录下存在该文件且两次切换的时间间隔不足 8 小时的话,将不允许触发切换。除非在第一次切换后手动 rm -rf /masterha/app1/app1.failover.complete。该参数代表忽略上次 MHA 触发切换产生的文件。

     查看 MHA Manager 监控是否正常

# masterha_check_status --conf=/etc/masterha/app1.cnf 
app1 (pid:1873) is running(0:PING_OK), master:192.168.244.10

十、关闭 MHA Manager 监控

# masterha_stop --conf=/etc/masterha/app1.cnf 
Stopped app1 successfully.
[1]+  Exit 1                  nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /masterha/app1/manager.log 2>&1

至此,MHA 部分配置完毕,下面,来配置 VIP。

十一、VIP 配置

VIP 配置可以采用两种方式,一是通过引入 Keepalived 来管理 VIP,另一种是在脚本中手动管理。

对于 keepalived 管理 VIP,存在脑裂情况,即当主从网络出现问题时,slave 会抢占 VIP,这样会导致主从数据库都持有 VIP,造成 IP 冲突,所以在网络不是很好的情况下,不建议采用 keepalived 服务。

在实际生产中使用较多的也是第二种,即在脚本中手动管理 VIP,所以,对 keepalived 不感兴趣的童鞋可直接跳过第一种方式。

1. keepalived 管理 VIP

1> 安装 keepalived

    因为我这里设置了 Candicate master,故只在 Master 和 Candicate master 上安装。

    如果没有 Candicate master,两个 Slave 的地位平等,则两个 Slave 上都需安装 keepalived。

    # wget http://www.keepalived.org/software/keepalived-1.2.24.tar.gz

    # tar xvf keepalived-1.2.24.tar.gz

    # cd keepalived-1.2.24

    # ./configure –prefix=/usr/local/keepalived

    # make

    # make install

    # cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/rc.d/init.d/

    # cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/

    # mkdir /etc/keepalived

    # cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/

    # cp /usr/local/keepalived/sbin/keepalived /usr/sbin/

2> 为 keepalived 设置单独的日志文件(非必需)

    keepalived 的日志默认是输出到 /var/log/message 中

    # vim /etc/sysconfig/keepalived

KEEPALIVED_OPTIONS="-D -d -S 0"

    设置 syslog

    # vim /etc/rsyslog.conf

    添加如下内容:

local0.*           /var/log/keepalived.log

    # service rsyslog restart 

2> 配置 keepalived

    在 Master 上修改

    # vim /etc/keepalived/keepalived.conf

global_defs {
   notification_email {slowtech@qq.com}
   notification_email_from root@localhost.localdomain 
   smtp_server 127.0.0.1 
   smtp_connect_timeout 30
   router_id MySQL-HA
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 150
    advert_int 1
    nopreempt
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {192.168.244.188/24
    }
}

    关于 keepalived 的参数的详细介绍,可参考:

    LVS+Keepalived 搭建 MyCAT 高可用负载均衡集群  http://www.linuxidc.com/Linux/2016-03/129231.htm

    将配置文件 scp 到 Candicate master 上

    # scp /etc/keepalived/keepalived.conf 192.168.244.20:/etc/keepalived/

    只需将配置文件中的 priority 设置为 90

    注意:我们为什么在这里设置 keepalived 为 backup 模式呢?

    在 master-backup 模式下,如果主库宕掉,VIP 会自动漂移到 Slave 上,当主库修复,keepalived 启动后,还会将 VIP 抢过来,即使设置了 nopreempt(不抢占)的方

    式,该动作仍会发生。但在 backup-backup 模式下,当主库修改,并启动 keepalived 后,并不会抢占新主的 VIP,即便原主的 priority 高于新主的。

3> 启动 keepalived

    先在 Master 上启动

    # service keepalived start

env: /etc/init.d/keepalived: Permission denied

    # chmod +x /etc/init.d/keepalived

    # service keepalived start

    查看绑定情况

    # ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:c6:47:04 brd ff:ff:ff:ff:ff:ff
    inet 192.168.244.10/24 brd 192.168.244.255 scope global eth0
    inet 192.168.244.188/24 scope global secondary eth0
    inet6 fe80::20c:29ff:fec6:4704/64 scope link 
       valid_lft forever preferred_lft forever

View Code

    可见,VIP(192168.244.188)已经绑定到 Master 的 eth0 网卡上了。

    启动 Candicate master 的 keepalived

    # service keepalived start

4> MHA 中引入 keepalived

    编辑 /usr/local/bin/master_ip_failover

    相对于原文件,修改地方为 93-95 行

  1 #!/usr/bin/env perl
  2 
  3 #  Copyright (C) 2011 DeNA Co.,Ltd.
  4 #
  5 #  This program is free software; you can redistribute it and/or modify
  6 #  it under the terms of the GNU General Public License as published by
  7 #  the Free Software Foundation; either version 2 of the License, or
  8 #  (at your option) any later version.
  9 #
 10 #  This program is distributed in the hope that it will be useful,
 11 #  but WITHOUT ANY WARRANTY; without even the implied warranty of
 12 #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 13 #  GNU General Public License for more details.
 14 #
 15 #  You should have received a copy of the GNU General Public License
 16 #   along with this program; if not, write to the Free Software
 17 #  Foundation, Inc.,
 18 #  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
 19 
 20 ## Note: This is a sample script and is not complete. Modify the script based on your environment.
 21 
 22 use strict;
 23 use warnings FATAL => 'all';
 24 
 25 use Getopt::Long;
 26 use MHA::DBHelper;
 27 my ( 28   $command,        $ssh_user,         $orig_master_host,
 29   $orig_master_ip, $orig_master_port, $new_master_host,
 30   $new_master_ip,  $new_master_port,  $new_master_user,
 31   $new_master_password
 32 );
 33 
 34 GetOptions( 35   'command=s'             => \$command,
 36   'ssh_user=s'            => \$ssh_user,
 37   'orig_master_host=s'    => \$orig_master_host,
 38   'orig_master_ip=s'      => \$orig_master_ip,
 39   'orig_master_port=i'    => \$orig_master_port,
 40   'new_master_host=s'     => \$new_master_host,
 41   'new_master_ip=s'       => \$new_master_ip,
 42   'new_master_port=i'     => \$new_master_port,
 43   'new_master_user=s'     => \$new_master_user,
 44   'new_master_password=s' => \$new_master_password,
 45 );
 46 
 47 exit &main();
 48 
 49 sub main { 50   if ($command eq "stop" || $command eq "stopssh" ) { 51 
 52     # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
 53     # If you manage master ip address at global catalog database,
 54     # invalidate orig_master_ip here.
 55     my $exit_code = 1;
 56     eval { 57 
 58       # updating global catalog, etc
 59       $exit_code = 0;
 60     };
 61     if ($@) { 62       warn "Got Error: $@\n";
 63       exit $exit_code;
 64     }
 65     exit $exit_code;
 66   }
 67   elsif ($command eq "start" ) { 68 
 69     # all arguments are passed.
 70     # If you manage master ip address at global catalog database,
 71     # activate new_master_ip here.
 72     # You can also grant write access (create user, set read_only=0, etc) here.
 73     my $exit_code = 10;
 74     eval { 75       my $new_master_handler = new MHA::DBHelper();
 76 
 77       # args: hostname, port, user, password, raise_error_or_not
 78       $new_master_handler->connect( $new_master_ip, $new_master_port,
 79         $new_master_user, $new_master_password, 1 );
 80 
 81       ## Set read_only=0 on the new master
 82       $new_master_handler->disable_log_bin_local();
 83       print "Set read_only=0 on the new master.\n";
 84       $new_master_handler->disable_read_only();
 85 
 86       ## Creating an app user on the new master
 87       #print "Creating app user on the new master..\n";
 88       #FIXME_xxx_create_user($new_master_handler->{dbh} );
 89       $new_master_handler->enable_log_bin_local();
 90       $new_master_handler->disconnect();
 91 
 92       ## Update master ip on the catalog database, etc
 93       my $cmd;
 94       $cmd = 'ssh '.$ssh_user.'@'.$orig_master_ip.' service keepalived stop';
 95       system($cmd);
 96       $exit_code = 0;
 97     };
 98     if ($@) { 99       warn $@;
100 
101       # If you want to continue failover, exit 10.
102       exit $exit_code;
103     }
104     exit $exit_code;
105   }
106   elsif ($command eq "status" ) {107 
108     # do nothing
109     exit 0;
110   }
111   else {112     &usage();
113     exit 1;
114   }
115 }
116 
117 sub usage {118   print
119 "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
120 }

2. 通过脚本的方式管理 VIP

  编辑 /usr/local/bin/master_ip_failover

#!/usr/bin/env perl

#  Copyright (C)2011 DeNA Co.,Ltd.
#
#  This program is
free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version
2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed
in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License
for more details.
#
#  You should have received a copy of the GNU General Public License
#  along with this program;
if not, write to the Free Software
#  Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA  021101301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL=> all;

use Getopt::Long;
use MHA::DBHelper;
my (
  $command,        $ssh_user,        $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);

my $vip= 192.168.244.188;
my $key
= 2;
my $ssh_start_vip
= /sbin/ifconfig eth0:$key $vip/24;
my $ssh_stop_vip
= /sbin/ifconfig eth0:$key down;
my $ssh_send_garp
= /sbin/arping -U $vip -I eth0 -c 1;

GetOptions(
 command=s            => \$command,
 
ssh_user=s            => \$ssh_user,
 
orig_master_host=s    => \$orig_master_host,
 
orig_master_ip=s      => \$orig_master_ip,
 
orig_master_port=i    => \$orig_master_port,
 
new_master_host=s    => \$new_master_host,
 
new_master_ip=s      => \$new_master_ip,
 
new_master_port=i    => \$new_master_port,
 
new_master_user=s    => \$new_master_user,
 
new_master_password=s => \$new_master_password,
);

exit&main();

sub main {
 if ($command eq stop || $command eq stopssh ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code= 1;
    eval {
      print
Disabling the VIP an old master: $orig_master_host \n;
     
&stop_vip();
      $exit_code
= 0;
    };
   
if ($@) {
      warn
Got Error: $@\n;
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ($command eq
start ) {

    # all arguments are passed.
    # If you manage master ip address at global catalog database,
    # activate new_master_ip here.
    # You can also grantwrite access (create user, set read_only=0, etc) here.
    my $exit_code
= 10;
    eval {
     
      my $new_master_handler
= new MHA::DBHelper();

      # args:hostname, port, user, password, raise_error_or_not
      $new_master_handler
->connect($new_master_ip, $new_master_port,
        $new_master_user, $new_master_password,
1 );

      ## Set read_only=0 on the new master
      $new_master_handler
->disable_log_bin_local();
      print
Set read_only=0 on the new master.\n;
      $new_master_handler
->disable_read_only();

      ## Creating an app user on the new master
      # printCreating app user on the new master..\n;
      # FIXME_xxx_create_user($new_master_handler
->{dbh} );
      $new_master_handler
->enable_log_bin_local();
      $new_master_handler
->disconnect();

      printEnabling the VIP $vip on the new master: $new_master_host \n;
     
&start_vip();
      $exit_code
= 0;
    };
   
if ($@) {
      warn $@;

      # If you want to continue failover, exit10.
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ($command eq
status ) {

    #do nothing
    exit
0;
  }
 
else {
   
&usage();
    exit
1;
  }
}

sub start_vip(){
    `ssh $ssh_user\@$new_master_host \ $ssh_start_vip \”`;
    `ssh $ssh_user\@$new_master_host \ $ssh_send_garp \”`;
}

sub stop_vip(){

    return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \ $ssh_stop_vip \”`;
}

sub usage {
  print
Usage: master_ip_failover –command=start|stop|stopssh|status –orig_master_host=host –orig_master_ip=ip –orig_master_port=port –new_master_host=host –new_master_ip=ip –new_master_port=port\n;
}
 

实际生产环境中,推荐这种方式来管理 VIP,可有效防止脑裂情况的发生。

至此,MHA 高可用环境基本搭建完毕。

关于 MHA 的常见操作,包括自动 Failover,手动 Failover,在线切换,可参考另一篇博客:

MHA 在线切换的步骤和原理    http://www.linuxidc.com/Linux/2017-05/144085.htm

总结:

1. 可单独调试 master_ip_failover,master_ip_online_change,send_report 等脚本

/usr/local/bin/master_ip_online_change --command=stop --orig_master_ip=192.168.244.10 --orig_master_host=192.168.244.10 --orig_master_port=3306 --orig_master_user=monitor --orig_master_password=monitor123 --orig_master_ssh_user=root --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port=3306 --new_master_user=monitor --new_master_password=monitor123 --new_master_ssh_user=root
/usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.244.10 --orig_master_ip=192.168.244.10 --orig_master_port=3306 --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123'

 2. 官方对于 master_ip_failover,master_ip_online_change,send_report 脚本,给出的只是 sample,切换的逻辑需要自己定义。

    很多童鞋对 perl 并不熟悉,觉得无从下手,其实,完全可以调用其它脚本,譬如 Python,shell 等。

    如:

[root@node4 ~]# cat test.pl
#
!/usr/bin/perl
use strict;
my $cmd
=python /root/test.py;
system($cmd);

[root@node4~]# cat test.py
#
!/usr/bin/python
print
hello,python

[root@node4~]# perl test.pl
hello,python
 

参考:

《深入浅出 MySQL》PDF 下载见 http://www.linuxidc.com/Linux/2016-05/130922.htm

本文永久更新链接地址 :http://www.linuxidc.com/Linux/2017-05/144086.htm

MHA(Master High Availability)是一套相对成熟的 MySQL 高可用方案,能做到在 0~30s 内自动完成数据库的故障切换操作,在 master 服务器不宕机的情况下,基本能保证数据的一致性。

它由两部分组成:MHA Manager(管理节点)和 MHA Node(数据节点)。其中,MHA Manager 可以单独部署在一台独立的机器上管理多个 master-slave 集群,也可以部署在一台 slave 上。MHA Node 则运行在每个 mysql 节点上,MHA Manager 会定时探测集群中的 master 节点,当 master 出现故障时,它自动将最新数据的 slave 提升为 master,然后将其它所有的 slave 指向新的 master。

在 MHA 自动故障切换过程中,MHA 试图保存 master 的二进制日志,从而最大程度地保证数据不丢失,当这并不总是可行的,譬如,主服务器硬件故障或无法通过 ssh 访问,MHA 就没法保存二进制日志,这样就只进行了故障转移但丢失了最新数据。可结合 MySQL 5.5 中推出的半同步复制来降低数据丢失的风险。

MHA 软件由两部分组成:Manager 工具包和 Node 工具包,具体说明如下:

MHA Manager:

1. masterha_check_ssh:检查 MHA 的 SSH 配置状况

2. masterha_check_repl:检查 MySQL 的复制状况

3. masterha_manager:启动 MHA

4. masterha_check_status:检测当前 MHA 运行状态

5. masterha_master_monitor:检测 master 是否宕机

6. masterha_master_switch:控制故障转移(自动或手动)

7. masterha_conf_host:添加或删除配置的 server 信息

8. masterha_stop:关闭 MHA

MHA Node:

save_binary_logs:保存或复制 master 的二进制日志

apply_diff_relay_logs:识别差异的 relay log 并将差异的 event 应用到其它 slave 中

filter_mysqlbinlog:去除不必要的 ROLLBACK 事件(MHA 已不再使用这个工具)

purge_relay_logs:消除中继日志(不会堵塞 SQL 线程)

另有如下几个脚本需自定义:

1. master_ip_failover:管理 VIP

2. master_ip_online_change:

3. masterha_secondary_check:当 MHA manager 检测到 master 不可用时,通过 masterha_secondary_check 脚本来进一步确认,减低误切的风险。

4. send_report:当发生故障切换时,可通过 send_report 脚本发送告警信息。

集群信息

角色                             IP 地址                 ServerID      类型

Master                         192.168.244.10   1                 写入

Candicate master          192.168.244.20   2                 读

Slave                           192.168.244.30   3                 读

Monitor host                 192.168.244.40                      监控集群组

注:操作系统均为 RHEL 6.7

其中,master 对外提供写服务,备选 master 提供读服务,slave 也提供相关的读服务,一旦 master 宕机,将会把备选 master 提升为新的 master,slave 指向新的 master

一、在所有节点上安装 MHA node

    1. 在 MySQL 服务器上安装 MHA node 所需的 perl 模块(DBD:mysql)

      # yum install perl-DBD-MySQL -y

    2. 在所有的节点上安装 mha node

      下载地址为:https://code.google.com/p/mysql-master-ha/wiki/Downloads?tm=2

      由于该网址在国内被墙,相关文件下载后,放到了个人网盘中,http://pan.baidu.com/s/1boS31vT,有需要的童鞋可自行下载。

      # tar xvf mha4mysql-node-0.56.tar.gz

      # cd mha4mysql-node-0.56

      # perl Makefile.PL 

Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Can.pm line 6.
BEGIN failed--compilation aborted at inc/Module/Install/Can.pm line 6.
Compilation failed in require at inc/Module/Install.pm line 283.
Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Makefile.pm line 4.
BEGIN failed--compilation aborted at inc/Module/Install/Makefile.pm line 4.
Compilation failed in require at inc/Module/Install.pm line 283.
Can't locate ExtUtils/MM_Unix.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Metadata.pm line 349.

    通过报错可以看出,是相关依赖包没有安装。

     # yum install perl-ExtUtils-MakeMaker -y

     # perl Makefile.PL 

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
Can't locate CPAN.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/AutoInstall.pm line 277.

    # yum install perl-CPAN -y

    # perl Makefile.PL

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI        ...loaded. (1.609)
- DBD::mysql ...loaded. (4.013)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::node

     # make 

     # make install

    至此,MHA node 节点安装完毕,会在 /usr/local/bin 下生成以下脚本文件

# ll /usr/local/bin/
total 44
-r-xr-xr-x 1 root root 16367 Jul 20 07:00 apply_diff_relay_logs
-r-xr-xr-x 1 root root  4807 Jul 20 07:00 filter_mysqlbinlog
-r-xr-xr-x 1 root root  8261 Jul 20 07:00 purge_relay_logs
-r-xr-xr-x 1 root root  7525 Jul 20 07:00 save_binary_logs

二、在 Monitor host 节点上部署 MHA Manager

     # tar xvf mha4mysql-manager-0.56.tar.gz 

     # cd mha4mysql-manager-0.56

     # perl Makefile.PL

 

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies…
[Core Features]
– DBI                  …loaded. (1.609)
– DBD::mysql            …loaded. (4.013)
Time::HiRes          …missing.
Config::Tiny          …missing.
Log::Dispatch        …missing.
Parallel::ForkManager …missing.
MHA::NodeConst        …missing.
==> Auto-install the 5 mandatory module(s) from CPAN? [y] y
*** Dependencies will be installed the next time you type make.
*** Module::AutoInstall configuration finished.
Checking
if your kit is complete…
Looks good
Warning: prerequisite Config::Tiny
0 not found.
Warning: prerequisite Log::Dispatch
0 not found.
Warning: prerequisite MHA::NodeConst
0 not found.
Warning: prerequisite Parallel::ForkManager
0 not found.
Warning: prerequisite Time::HiRes
0 not found.
Writing Makefile
for mha4mysql::manager

     # make

     # make install

    执行完毕后,会在 /usr/local/bin 下新增以下几个文件  

# ll /usr/local/bin/
total
40
-r-xr-xr-x 1 root root 1991 Jul 20 00:50 masterha_check_repl-r-xr-xr-x 1 root root 1775 Jul 20 00:50 masterha_check_ssh
-r-xr-xr-x 1 root root 1861 Jul 20 00:50 masterha_check_status
-r-xr-xr-x 1 root root 3197 Jul 20 00:50 masterha_conf_host
-r-xr-xr-x 1 root root 2513 Jul 20 00:50 masterha_manager
-r-xr-xr-x 1 root root 2161 Jul 20 00:50 masterha_master_monitor
-r-xr-xr-x 1 root root 2369 Jul 20 00:50 masterha_master_switch
-r-xr-xr-x 1 root root 5167 Jul 20 00:50 masterha_secondary_check-r-xr-xr-x 1 root root 1735 Jul 20 00:50 masterha_stop 

三、配置 SSH 登录无密码验证

    1. 在 manager 上配置到所有 Node 节点的无密码验证

      # ssh-keygen

      一路按“Enter”

     # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.10

     # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.20

     # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.30

    2. 在 Master(192.168.244.10)上配置

    # ssh-keygen

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.20

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.30

    3. 在 Candicate master(192.168.244.20)上配置     

    # ssh-keygen

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.10

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.30

     4. 在 Slave(192.168.244.30)上配置     

    # ssh-keygen

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.10

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.20

四、搭建主从复制环境

     1. 在 Master 上执行备份

     # mysqldump –master-data=2 –single-transaction -R –triggers -A > all.sql

     其中,- R 是备份存储过程,–triggers 是备份触发器 - A 代表全库

     2. 在 Master 上创建复制用户

mysql> grant replication slave on *.* to 'repl'@'192.168.244.%' identified by 'repl';
Query OK, 0 rows affected (0.09 sec)

    3. 查看备份文件 all.sql 中的 CHANGE MASTER 语句

      # head -n 30 all.sql

-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=120;

     4. 将备份文件复制到 Candicate master 和 Slave 上

     # scp all.sql 192.168.244.20:/root/

     # scp all.sql 192.168.244.30:/root/

     5. 在 Candicate master 上搭建从库

     # mysql < all.sql 

     设置复制信息

mysql> CHANGE MASTER TO
    -> MASTER_HOST=192.168.244.10,
   
-> MASTER_USER=repl,
   
-> MASTER_PASSWORD=repl,
   
-> MASTER_LOG_FILE=mysql-bin.000002,
   
-> MASTER_LOG_POS=120;
Query OK,
0 rows affected, 2 warnings (0.19 sec)

mysql> start slave;
Query OK,
0 rows affected (0.02 sec)

mysql> show slave status\G

       6. 在 Slave 上搭建从库

       7. slave 服务器设置为 read only

mysql> set global read_only=1;
Query OK, 0 rows affected (0.04 sec)

       8. 在 Master 中创建监控用户

mysql> grant all privileges on *.* to 'monitor'@'%' identified by 'monitor123';
Query OK, 0 rows affected (0.07 sec)

五、配置 MHA

     1. 在 Monitor host(192.168.244.40)上创建 MHA 工作目录,并且创建相关配置文件

     # mkdir -p /etc/masterha

     # vim /etc/masterha/app1.cnf

[server default]
manager_log=/masterha/app1/manager.log          // 设置 manager 的日志 
manager_workdir=/masterha/app1           // 设置 manager 的工作目录 
master_binlog_dir=/var/lib/mysql                  // 设置 master 默认保存 binlog 的位置,以便 MHA 可以找到 master 的日志 
master_ip_failover_script= /usr/local/bin/master_ip_failover    // 设置自动 failover 时候的切换脚本 
master_ip_online_change_script= /usr/local/bin/master_ip_online_change  // 设置手动切换时候的切换脚本 
user=monitor               // 设置监控用户 
password=monitor123         // 设置监控用户的密码 
ping_interval=1         // 设置监控主库,发送 ping 包的时间间隔,默认是 3 秒,尝试三次没有回应的时候进行自动 failover
remote_workdir=/tmp     // 设置远端 mysql 在发生切换时 binlog 的保存位置 
repl_user=repl          // 设置复制环境中的复制用户名 
repl_password=repl    // 设置复制用户的密码 
report_script=/usr/local/bin/send_report    // 设置发生切换后发送的报警的脚本 
secondary_check_script= /usr/local/bin/masterha_secondary_check -s 192.168.244.20 -s 192.168.244.30 --user=root --master_host=192.168.244.10 --master_ip=192.168.244.10 --master_port=3306  // 一旦 MHA 到 master 的监控之间出现问题,MHA Manager 将会判断其它两个 slave 是否能建立到 master_ip 3306 端口的连接 
shutdown_script=""      // 设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机防止发生脑裂)
ssh_user=root           // 设置 ssh 的登录用户名 

[server1]
hostname=192.168.244.10
port=3306

[server2]
hostname=192.168.244.20
port=3306
candidate_master=1   // 设置为候选 master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中最新的 slave
check_repl_delay=0   // 默���情况下如果一个 slave 落后 master 100M 的 relay logs 的话,MHA 将不会选择该 slave 作为一个新的 master,因为对于这个 slave 的恢复需要花费很长时间,通过设置 check_repl_delay=0,MHA 触发切换在选择一个新的 master 的时候将会忽略复制延时,这个参数对于设置了 candidate_master= 1 的主机非常有用,因为它保证了这个候选主在切换过程中一定是最新的 master

[server3]
hostname=192.168.244.30
port=3306

      注意:

      1> 在编辑该文件时,后面的注释切记要去掉,MHA 并不会将后面的内容识别为注释。

      2> 配置文件中设置了 master_ip_failover_script 和 secondary_check_script 两个选项,对应的文件在上面提供的百度云盘中有。

      2. 设置 relay log 清除方式(在每个 Slave 上)

mysql> set global relay_log_purge=0;
Query OK, 0 rows affected (0.00 sec)

      MHA 在发生切换过程中,从库在恢复的过程中,依赖于 relay log 的相关信息,所以我们这里要将 relay log 的自动清楚设置为 OFF,采用手动清楚 relay log 的方式。

      在默认情况下,从服务器上的中继日志会在 SQL 线程执行完后被自动删除。但是在 MHA 环境中,这些中继日志在恢复其它从服务器时可能会被用到,因此需要禁用中继日志的自动清除。改为定期手动清除 SQL 线程应用完的中继日志。

      在 ext3 文件系统下,删除大的文件需要一定的时间,这样会导致严重的复制延迟,所以在 Linux 中,一般都是通过硬链接的方式来删除大文件。

      3. 设置定期清理 relay 脚本

        MHA 节点中包含了 purge_relay_logs 脚本,它可以为 relay log 创建硬链接,执行 set global relay_log_purge=1,等待几秒钟以便 SQL 线程切换到新的中继日志,再执行 set global relay_log_purge=0。

        下面看看脚本的使用方法:

        # purge_relay_logs –user=monitor –password=monitor123 -disable_relay_log_purge –workdir=/tmp/

20170424 20:27:46: purge_relay_logs script started.
Found relay_log.
info: /var/lib/mysql/relay-log.info
Opening
/var/lib/mysql/mysqld-relay-bin.000001 ..
Opening
/var/lib/mysql/mysqld-relay-bin.000002 ..
Opening
/var/lib/mysql/mysqld-relay-bin.000003 ..
Opening
/var/lib/mysql/mysqld-relay-bin.000004 ..
Opening
/var/lib/mysql/mysqld-relay-bin.000005 ..
Opening
/var/lib/mysql/mysqld-relay-bin.000006 ..
Executing SET GLOBAL relay_log_purge
=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if i
t keeps up); SET GLOBAL relay_log_purge
=0; .. ok.20170424 20:27:50: All relay log purging operations succeeded.

        其中,

        –user:mysql 用户名

        –password:mysql 用户的密码

        –host:mysqlserver 地址

        –workdir:指定创建 relay log 的硬链接的位置,默认的是 /var/tmp。由于系统不同分区创建硬链接文件会失败,故需要指定具体的硬链接的位置。

        –disable_relay_log_purge:默认情况下,如果 relay_log_purge=1,则脚本会直接退出。通过设置这个参数,该脚本会首先将 relay_log_purge 设置为 1,清除掉 relay log 后,再将该参数设置为 0。

        设置 crontab 来定期清理 relay log

        MHA 在切换的过程中会直接调用 mysqlbinlog 命令,故需要在环境变量中指定 mysqlbinlog 的具体路径。

        # vim /etc/cron.d/purge_relay_logs

0 4 * * * /usr/local/bin/purge_relay_logs --user=monitor --password=monitor123 -disable_relay_log_purge --workdir=/tmp/ >> /tmp/purge
_relay_logs.log 2>&1

      注意:最好是每台 slave 服务器在不同时间点执行该计划任务。

      4. 将 mysqlbinlog 的路径添加到环境变量中

更多详情见请继续阅读下一页的精彩内容 :http://www.linuxidc.com/Linux/2017-05/144086p2.htm

正文完
星哥说事-微信公众号
post-qrcode
 
星锅
版权声明:本站原创文章,由 星锅 2022-01-22发表,共计41448字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
【腾讯云】推广者专属福利,新客户无门槛领取总价值高达2860元代金券,每种代金券限量500张,先到先得。
阿里云-最新活动爆款每日限量供应
评论(没有评论)
验证码
【腾讯云】云服务器、云数据库、COS、CDN、短信等云产品特惠热卖中