在 MySQL 数据库中,什么是集群?什么是高可用?最早我理解的是只要有了读写分离的组从集群架构,就是拥有高可用的架构了。现在想想,当初的想法是多么的天真。MySQL 集群和 MySQL 的高可用是两个概念,我们不能把他们混为一谈。
集群是指你多台 MySQL 数据库实例,有一个可以提供写,多个提供读、或者有多个可以写,多个可以读。比如我们配置了读写分离的一主多从的集群架构,或者双主多从的集群架构。此时,我们只是有了 MySQL 的集群,在读写性能上有锁提升,但是此时并不是一个高可用的 MySQL 架构。
高可用是指在任意一个时刻,我们的 MySQL 都可以提供读写的服务,不会因为某一个 MySQL 数据库实例宕机而导致 MySQL 数据不能正常读写,此时才是高可用。从某种意义上来说,高可用要在集群的基础上才可以实现,也就是说,要想高可用,必须先把集群搭建起来。有了 MySQL 的集群,才可以谈论 MySQL 的高可用,没有 MySQL 的集群,谈 MySQL 的高可用犹如空中楼阁。
拿我们前面说的双主多从的 MySQL 集群架构来说,两个主在同一个时间点,若只有一个主对外提供写的服务,此时如果其中一个主宕机,另外一个主可以顶替原先的主对外提供写的服务,同时其他的从可以把同步的数据源切换到新的主上面来,这样的一个集群就是一个高可用的架构。
有了 MySQL 的集群,并不一定就有 MySQL 的高可用。但是如果有了 MySQL 的高可用,那么此时它一定是一个 MySQL 集群。高可用需要在集群的基础上来配合其他组件才能实现。
MMM 高可用架构
什么是 MMM
所谓的 MMM 只是:Multi-Master Replication Manager for MySQL,取其中的三个 M 开头的单词简写。它是 mysql 多主复制管理器,基于 perl 实现,关于 mysql 主主复制配置的监控、故障转移和管理的一套可伸缩的脚本套件(在任何时候只有一个节点可以被写入),MMM 也能对从服务器进行读负载均衡,所以可以用它来在一组用于复制的服务器启动虚拟 ip,除此之外,它还有实现数 据备份、节点之间重新同步功能的脚本。
MySQL 本身没有提供 replication failover 的解决方案,通过 MMM 方案 能实现服务器的故障转移,从而实现 mysql 的高可用。MMM 不仅能提供浮动 IP 的功能,如果当前的主服务器挂掉后,会将你后端的从服务器自动转向新的主服务器进行同步复制,不用手工更改同步配置。
–cap-add NET_ADMIN:默认容器运行的时候,没有增加额外的 Linux 的功能,这里我们要增加上 NET_ADMIN 的功能,否则我们不能再启动后的容器内使用 ifconfig 命令增加虚拟 IP。否则会有如下错误提示:
1 2 3 4 5
root@test:/etc/network# ifconfig eth0:0 192.168.1.100 up SIOCSIFADDR: Operation not permitted SIOCSIFFLAGS: Operation not permitted SIOCSIFFLAGS: Operation not permitted root@test:/etc/network#
➜ ~ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 5207089da9a5 mysql:5.7.31 "docker-entrypoint.s…" About a minute ago Up About a minute 33060/tcp, 0.0.0.0: 33022->3306/tcp mysql-ha-mmm-slave2 1c8ff5960272 mysql:5.7.31 "docker-entrypoint.s…" About a minute ago Up About a minute 33060/tcp, 0.0.0.0: 33021->3306/tcp mysql-ha-mmm-master2 d8d646ba25f1 mysql:5.7.31 "docker-entrypoint.s…" About a minute ago Up About a minute 33060/tcp, 0.0.0.0: 33012->3306/tcp mysql-ha-mmm-slave1 d3d75d9cd930 mysql:5.7.31 "docker-entrypoint.s…" About a minute ago Up About a minute 33060/tcp, 0.0.0.0: 33011->3306/tcp mysql-ha-mmm-master1 ➜ ~
上面我们启动的四个 MySQL 数据库实例,我们还需要一个监控节点,下面启动一个监控节点,用于安装 MMM 的监控服务。
为了和其他环境一致,这里我们也使用上面 MySQL 的镜像。如下是启动一个新的 MySQL 容器,这个容器,我们不在挂载 MySQL 的 my.cnf 配置文件了,因为这个 MySQL 数据库我们不会使用,我们只是在这个容器中安装 MMM 的监控服务,我们此时把这个 MySQL 数据库容器当成一个 Debian 版本的虚拟机来使用。
echo " deb http://mirrors.aliyun.com/debian/ buster main non-free contrib deb http://mirrors.aliyun.com/debian-security buster/updates main deb http://mirrors.aliyun.com/debian/ buster-updates main non-free contrib deb http://mirrors.aliyun.com/debian/ buster-backports main non-free contrib
deb-src http://mirrors.aliyun.com/debian-security buster/updates main deb-src http://mirrors.aliyun.com/debian/ buster main non-free contrib deb-src http://mirrors.aliyun.com/debian/ buster-updates main non-free contrib deb-src http://mirrors.aliyun.com/debian/ buster-backports main non-free contrib " > etc/apt/sources.list
Valid commands are: help \- show this message ping \- ping monitor show - show status checks \[<host>|all \[<check>|all\]\] - show checks status set_online <host> \- set host <host> online set_offline <host> \- set host <host> offline mode - print current mode. set_active - switch into active mode. set_manual - switch into manual mode. set_passive - switch into passive mode. move_role \[--force\] <role> <host> \- move exclusive role <role> to host <host> (Only use --force if you know what you are doing!) set_ip <ip> <host> \- set role with ip <ip> to host <host>
root@monitor:/#
常用的维护节点状态的命令如下:
1 2 3 4 5 6 7
mmm_control show mmm_control checks all mmm\_control set\_online master1 mmm\_control set\_online master2 mmm\_control set\_online slave1 mmm\_control set\_online slave2
root@monitor:/etc/mysql-mmm# mmm\_control set\_online master1; OK: State of 'master1' changed to ONLINE. Now you can wait some time and check its new roles! root@monitor:/etc/mysql-mmm# mmm\_control set\_online master2 OK: State of 'master2' changed to ONLINE. Now you can wait some time and check its new roles! root@monitor:/etc/mysql-mmm# mmm\_control set\_online slave1 OK: State of 'slave1' changed to ONLINE. Now you can wait some time and check its new roles! root@monitor:/etc/mysql-mmm# mmm\_control set\_online slave2 OK: State of 'slave2' changed to ONLINE. Now you can wait some time and check its new roles! root@monitor:/etc/mysql-mmm# mmm_control show master1(172.20.0.11) master/ONLINE. Roles: reader(172.20.0.111), writer(172.20.0.100) master2(172.20.0.21) master/ONLINE. Roles: reader(172.20.0.211) slave1(172.20.0.12) slave/ONLINE. Roles: reader(172.20.0.122) slave2(172.20.0.22) slave/ONLINE. Roles: reader(172.20.0.222)
root@monitor:/etc/mysql-mmm# tail -f var/log/mysql-mmm/mmm_mond.log 2021/02/22 23:26:27 INFO Waiting for network connection... 2021/02/22 23:26:27 INFO Spawning checker 'ping_ip'... 2021/02/22 23:26:27 INFO Shutting down checker 'ping_ip'... 2021/02/22 23:26:27 INFO Network connection is available. 2021/02/22 23:26:27 FATAL Child exited with exitcode 255, restarting after 10 second sleep
root@monitor:/etc/mysql-mmm# etc/init.d/mysql-mmm-monitor start Daemon bin: '/usr/sbin/mmm_mond' Daemon pid: '/var/run/mmm_mond.pid' Starting MMM Monitor daemon: 2021/02/22 23:30:17 INFO STARTING... 2021/02/22 23:30:17 DEBUG Created pid file '/var/run/mmm_mond.pid' with pid 5330 2021/02/22 23:30:17 INFO Waiting for network connection... 2021/02/22 23:30:17 INFO Spawning checker 'ping_ip'... 2021/02/22 23:30:17 DEBUG IP '172.20.0.1' is reachable: OK 2021/02/22 23:30:17 INFO Shutting down checker 'ping_ip'... 2021/02/22 23:30:17 INFO Network connection is available. Use of uninitialized value $old_state in string ne at usr/share/perl5/MMM/Monitor/Agent.pm line 42. 2021/02/22 23:30:17 FATAL Child exited with exitcode 255, restarting after 10 second sleep
root@monitor:/# tail -f var/log/mysql-mmm/mmm_mond.log 2021/02/22 23:42:43 INFO Check 'rep_backlog' on 'slave1' is ok! 2021/02/22 23:42:43 INFO Check 'rep_backlog' on 'master2' is ok! 2021/02/22 23:42:43 INFO Check 'rep_threads' on 'master1' is ok! 2021/02/22 23:42:43 INFO Check 'rep_threads' on 'slave2' is ok! 2021/02/22 23:42:43 INFO Check 'rep_threads' on 'slave1' is ok! 2021/02/22 23:42:43 INFO Check 'rep_threads' on 'master2' is ok! 2021/02/22 23:42:43 INFO Check 'ping' on 'master1' is ok! 2021/02/22 23:42:43 INFO Check 'ping' on 'slave2' is ok! 2021/02/22 23:42:43 INFO Check 'ping' on 'slave1' is ok! 2021/02/22 23:42:43 INFO Check 'ping' on 'master2' is ok!
问题排查 2:虚拟 IP 不能再各个节点生成?
启动 MMM 后,发现不能通过 VIP 访问 MySQL 数据库,去各个节点使用 ifconfig -a 或者 ip addr 命令查看发现没有对应的 VIP 生成。查看了各个节点 MMM agent 服务的日志文件/var/log/mysql-mmm/mmm_agent.log ,发现如下错误信息:
1 2 3 4 5
root@master1:/# tail -f var/log/mysql-mmm/mmm_agentd.log 2021/02/23 17:16:19 FATAL Couldn't configure IP '172.20.0.111' on interface 'eth0': undef 2021/02/23 17:16:19 FATAL Couldn't allow writes: undef 2021/02/23 17:16:22 FATAL Couldn't configure IP '172.20.0.111' on interface 'eth0': undef 2021/02/23 17:16:22 FATAL Couldn't allow writes: undef
root@monitor:/etc/mysql-mmm# mysql -uroot -proot -h172.20.0.100 -e 'select @@hostname' mysql: \[Warning\] Using a password on the command line interface can be insecure. + ---+ | @@hostname | + ---+ | master1.mysql | + ---+ root@monitor:/etc/mysql-mmm# mysql -uroot -proot -h172.20.0.111 -e 'select @@hostname' mysql: \[Warning\] Using a password on the command line interface can be insecure. + ---+ | @@hostname | + ---+ | master1.mysql | + ---+ root@monitor:/etc/mysql-mmm# mysql -uroot -proot -h172.20.0.122 -e 'select @@hostname' mysql: \[Warning\] Using a password on the command line interface can be insecure. + ---+ | @@hostname | + ---+ | master2.mysql | + ---+ root@monitor:/etc/mysql-mmm# mysql -uroot -proot -h172.20.0.211 -e 'select @@hostname' mysql: \[Warning\] Using a password on the command line interface can be insecure. + -----+ | @@hostname | + -----+ | slave1.mysql | + -----+ root@monitor:/etc/mysql-mmm# mysql -uroot -proot -h172.20.0.222 -e 'select @@hostname' mysql: \[Warning\] Using a password on the command line interface can be insecure. + -----+ | @@hostname | + -----+ | slave2.mysql | + -----+ root@monitor:/etc/mysql-mmm#
验证 MMM 的高可用
我们把 master1 节点上面的 MySQL 服务停止掉,来验证一下是否会把主从同步的源头从 master1 切换为 master2 上面,并且我们在 monitor 节点上,仍然可以通过 writer vip 去连接到 MySQL 上。
停止 master1 节点上面的 MySQL 服务,在 master1 节点上,执行如下命令:
1 2 3 4 5
root@master1:~/Net-ARP-1.0.11# etc/init.d/mysql stop ............ \[info\] MySQL Community Server 5.7.31 is stopped. root@master1:~/Net-ARP-1.0.11# % ➜ ~
root@monitor:/etc/mysql-mmm# tail -20 /var/log/mysql-mmm/mmm_mond.log 2021/02/23 16:19:49 INFO Check 'mysql' on 'slave2' is ok! 2021/02/23 16:19:49 INFO Check 'mysql' on 'master1' is ok! 2021/02/23 16:19:49 INFO Check 'mysql' on 'slave1' is ok! 2021/02/23 16:19:49 INFO Check 'mysql' on 'master2' is ok! 2021/02/23 18:44:30 WARN Check 'rep_threads' on 'master1' is in unknown state! Message: UNKNOWN: Connect error (host = 172.20.0.11:3306, user = mmm_monitor)! Can't connect to MySQL server on '172.20.0.11' (115) 2021/02/23 18:44:30 WARN Check 'rep_backlog' on 'master1' is in unknown state! Message: UNKNOWN: Connect error (host = 172.20.0.11:3306, user = mmm_monitor)! Can't connect to MySQL server on '172.20.0.11' (115) 2021/02/23 18:44:40 ERROR Check 'mysql' on 'master1' has failed for 10 seconds! Message: ERROR: Connect error (host = 172.20.0.11:3306, user = mmm_monitor)! Can't connect to MySQL server on '172.20.0.11' (115) 2021/02/23 18:44:41 FATAL State of host 'master1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK) 2021/02/23 18:44:41 INFO Removing all roles from host 'master1': 2021/02/23 18:44:41 INFO Removed role 'reader(172.20.0.111)' from host 'master1' 2021/02/23 18:44:41 INFO Removed role 'writer(172.20.0.100)' from host 'master1' 2021/02/23 18:44:41 FATAL Can't reach agent on host 'master1' 2021/02/23 18:44:41 ERROR Can't send offline status notification to 'master1' \- killing it! 2021/02/23 18:44:41 FATAL Could not kill host 'master1' \- there may be some duplicate ips now! (There's no binary configured for killing hosts.) 2021/02/23 18:44:41 INFO Orphaned role 'writer(172.20.0.100)' has been assigned to 'master2' 2021/02/23 18:44:41 INFO Orphaned role 'reader(172.20.0.111)' has been assigned to 'slave1' 2021/02/23 18:44:57 ERROR Check 'ping' on 'master1' has failed for 11 seconds! Message: ERROR: Could not ping 172.20.0.11 root@monitor:/etc/mysql-mmm#
root@monitor:/etc/mysql-mmm# mysql -uroot -proot -h172.20.0.100 -e 'select @@hostname' mysql: \[Warning\] Using a password on the command line interface can be insecure. + ---+ | @@hostname | + ---+ | master2.mysql | + ---+ root@monitor:/etc/mysql-mmm#