18.1 集群介绍
一台机器完成不了的任务我们交给一大群机器去做,集群就好比一个堆叠起来的计算机。
集群根据功能划分两大类:高可用和负载均衡
- 高可用集群通常为两台服务器,一台工作,另外一台作为冗余,当提供服务的机器宕机,冗余将接替继续提供服务。(保证服务的可用性)实现高可用的开源软件有:heartbeat、keepalived。heartbeat在centos6上bug比较多,而且不再更新了,更推荐keepalived。
- 负载均衡集群,需要有一台服务器作为分发器,它负责把用户的请求分发给后端的服务器处理,在这个集群里,除了分发器外,就是给用户提供服务的服务器了,这些服务器数量至少为2个。实现负载均衡的开源软件有LVS、keepalived、haproxy、nginx,商业的有F5、Netscaler。LVS是非常出名的一款做负载均衡的软件。商业软件的稳定性和应对高访问量是值得肯定的,开源软件的稳定性等就会取决于你的服务器的性能了。
18.2 keepalived介绍
在这里我们使用keepalived来实现高可用集群,因为heartbeat在centos6上有一些问题,影响实验效果。heartbeat在切换主从的时候有延时,做高可用集群的时候还是推荐使用keepalived。
keepalived通过VRRP(Virtual Router Redundancy Protocl,虚拟路由冗余协议)来实现高可用。
在这个协议里会将多台功能相同的路由器(每一个路由器就是一个服务器)组成一个小组,这个小组里会有1个master角色和N(N>=1)个backup角色。
master会通过组播的形式向各个backup发送VRRP协议的数据包,当backup收不到master发来的VRRP数据包时,就会认为master宕机了。此时就需要根据各个backup的优先级来决定谁成为新的mater。
Keepalived要有三个模块,分别是core、check和vrrp。其中core模块为keepalived的核心,负责主进程的启动、维护以及全局配置文件的加载和解析,check模块负责健康检查,vrrp模块是来实现VRRP协议的。
18.3/4/5 用keepalived配置高可用集群
准备两台机器,本文实验室用101和102,101作为master,102作为backup。
两台机器都要yum安装好keepalived。
[root@localhost: ~]# yum install keepalived
那实现高可用肯定是针对某一项服务来说的,我们这里使用nginx做高可用的实验。至于为什么使用nginx,之前讲过nginx是可以做负载均衡的,如果nginx挂掉,即使你的网页服务还能使用,用户也是无法访问的,这个是不能出现单点故障的,是做高可用的一个很典型的需求。
默认安装的keepalived的配置文件在/etc/keepalived/keepalived.conf,
我们使用的是下面的配置,清空默认文件 复制
[root@localhost: ~]# > /etc/keepalived/keepalived.conf
[root@localhost: ~]# vim /etc/keepalived/keepalived.conf
global_defs { #全局的定义参数
notification_email {
aming@aminglinux.com #出现问题是发邮件
}
notification_email_from root@aminglinux.com #指定发邮件的地址
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id LVS_DEVEL
}
vrrp_script chk_nginx {
script "/usr/local/sbin/check_ng.sh" #检测一个服务是否正常,脚本附在下方。
interval 3 #单位 秒
}
vrrp_instance VI_1 {
state MASTER #定义角色
interface ens33
virtual_router_id 51 #这个id定义路由器的id
priority 100 #主角色和从角色的权重是不同的
advert_int 1
authentication { #认证
auth_type PASS #密码模式
auth_pass aminglinux>com
}
virtual_ipaddress { #定义公有ip,主和从都要绑定相同的ip,域名解析都是解析到此ip
192.168.127.100
}
track_script {
chk_nginx
}
}
主上的检查nginx是否正常的脚本。
#!/bin/bash
#时间变量,用于记录日志
d=`date --date today +%Y%m%d_%H:%M:%S`
#计算nginx进程数量
n=`ps -C nginx --no-heading|wc -l`
#如果进程为0,则启动nginx,并且再次检测nginx进程数量,
#如果还为0,说明nginx无法启动,此时需要关闭keepalived
if [ $n -eq "0" ]; then
systemctl start nginx
n2=`ps -C nginx --no-heading|wc -l`
if [ $n2 -eq "0" ]; then
echo "$d nginx down,keepalived will stop" >> /var/log/check_ng.log
systemctl stop keepalived #防止脑裂现象
fi
fi
给脚本755权限。
[root@localhost: ~]# vim /usr/local/sbin/check_ng.sh
[root@localhost: ~]# chmod 755 /usr/local/sbin/check_ng.sh
分别启动nginx和keepalived服务,如果nginx服务没有启动,按照脚本上的信息,keepalived启动后会被杀死。
[root@localhost: ~]# ps aux | grep keep
root 4861 0.0 0.0 112708 980 pts/2 R+ 00:47 0:00 grep --color=auto keep
[root@localhost: ~]# ps aux | grep nginx
root 4863 0.0 0.0 112708 976 pts/2 R+ 00:47 0:00 grep --color=auto nginx
[root@localhost: ~]# systemctl start nginx.service
[root@localhost: ~]# ps aux | grep nginx
root 4895 0.5 0.0 120792 2232 ? Ss 00:47 0:00 nginx: master process /usr/sbin/nginx
nginx 4896 0.5 0.0 123260 3548 ? S 00:47 0:00 nginx: worker process
nginx 4897 0.0 0.0 123260 3548 ? S 00:47 0:00 nginx: worker process
nginx 4898 0.0 0.0 123260 3520 ? S 00:47 0:00 nginx: worker process
nginx 4899 0.0 0.0 123260 3536 ? S 00:47 0:00 nginx: worker process
root 4901 0.0 0.0 112708 976 pts/2 S+ 00:47 0:00 grep --color=auto nginx
[root@localhost: ~]# systemctl start keepalived.service
[root@localhost: ~]# ps aux | grep keep
root 4909 0.0 0.0 118652 1400 ? Ss 00:48 0:00 /usr/sbin/keepalived -D
root 4913 0.0 0.0 112708 980 pts/2 R+ 00:48 0:00 grep --color=auto keep
停掉nginx服务,然后在查看进程
[root@localhost: ~]# ps aux | grep nginx
root 5347 0.0 0.0 120792 2236 ? Ss 01:07 0:00 nginx: master process /usr/sbin/nginx
nginx 5348 0.0 0.0 123260 3552 ? S 01:07 0:00 nginx: worker process
nginx 5349 0.0 0.0 123260 3552 ? S 01:07 0:00 nginx: worker process
nginx 5350 0.0 0.0 123260 3784 ? S 01:07 0:00 nginx: worker process
nginx 5351 0.0 0.0 123260 3552 ? S 01:07 0:00 nginx: worker process
root 5983 0.0 0.0 112712 972 pts/2 S+ 01:22 0:00 grep --color=auto nginx
[root@localhost: ~]# systemctl stop nginx
[root@localhost: ~]# ps aux | grep nginx
root 6026 0.0 0.0 120792 2236 ? Ss 01:22 0:00 nginx: master process /usr/sbin/nginx
nginx 6027 0.0 0.0 123260 3552 ? S 01:22 0:00 nginx: worker process
nginx 6028 0.0 0.0 123260 3552 ? S 01:22 0:00 nginx: worker process
nginx 6029 0.0 0.0 123260 3552 ? S 01:22 0:00 nginx: worker process
nginx 6030 0.0 0.0 123260 3552 ? S 01:22 0:00 nginx: worker process
root 6035 0.0 0.0 112708 976 pts/2 S+ 01:22 0:00 grep --color=auto nginx
可以看到服务启动时间,我们的确关闭了服务,它又自己重新启动了。
之前一直不成功,查看日志发现是网卡配置错误,我101本机上的网卡是ens37,更改后没有问题了。
[root@localhost: ~]# vim /var/log/messages
Sep 6 01:19:17 localhost Keepalived[5812]: Keepalived_vrrp exited with permanent error CONFIG. Terminating
Sep 6 01:19:17 localhost Keepalived[5812]: Stopping
查看vip是否已经成功配置
[root@localhost: ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
link/ether 00:0c:29:18:28:6e brd ff:ff:ff:ff:ff:ff
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:18:28:78 brd ff:ff:ff:ff:ff:ff
inet 192.168.127.101/24 brd 192.168.127.255 scope global ens37
valid_lft forever preferred_lft forever
inet 192.168.127.100/32 scope global ens37
valid_lft forever preferred_lft forever
之后就是从上配置keepalived
global_defs {
notification_email {
aming@aminglinux.com
}
notification_email_from root@aminglinux.com
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id LVS_DEVEL
}
vrrp_script chk_nginx {
script "/usr/local/sbin/check_ng.sh"
interval 3
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass aminglinux>com
}
virtual_ipaddress {
192.168.127.100
}
track_script {
chk_nginx
}
}
区别仅仅在于state、priority和vip。
监控检测脚本如下
#时间变量,用于记录日志
d=`date --date today +%Y%m%d_%H:%M:%S`
#计算nginx进程数量
n=`ps -C nginx --no-heading|wc -l`
#如果进程为0,则启动nginx,并且再次检测nginx进程数量,
#如果还为0,说明nginx无法启动,此时需要关闭keepalived
if [ $n -eq "0" ]; then
systemctl start nginx
n2=`ps -C nginx --no-heading|wc -l`
if [ $n2 -eq "0" ]; then
echo "$d nginx down,keepalived will stop" >> /var/log/check_ng.log
systemctl stop keepalived
fi
fi
启动nginx的一行命令取决于你是如何安装的nginx。然后启动keepalived和nginx。
我们来测试一下。
由于我们两个nginx都是yum安装的,修改/usr/share/nginx/html/index.html用以区分master和backup。
测试1:关闭master上的nginx服务
nginx并不能关闭,因为keepalived脚本中检测到nginx没有之后会尝试重启。
测试2:在master上增加iptabls规则
[root@localhost: ~]# iptables -I OUTPUT -p vrrp -j DROP
在主上封掉vrrp协议,并不能达到切换资源的目的。
测试3:关闭master上的keepalived服务
[root@localhost: ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
link/ether 00:0c:29:18:28:6e brd ff:ff:ff:ff:ff:ff
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:18:28:78 brd ff:ff:ff:ff:ff:ff
inet 192.168.127.101/24 brd 192.168.127.255 scope global ens37
valid_lft forever preferred_lft forever
inet 192.168.127.100/32 scope global ens37
valid_lft forever preferred_lft forever
[root@localhost: ~]# systemctl stop keepalived.service
[root@localhost: ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
link/ether 00:0c:29:18:28:6e brd ff:ff:ff:ff:ff:ff
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:18:28:78 brd ff:ff:ff:ff:ff:ff
inet 192.168.127.101/24 brd 192.168.127.255 scope global ens37
valid_lft forever preferred_lft forever
此时master上的vip解绑。
[root@localhost: ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
link/ether 00:0c:29:af:9d:b8 brd ff:ff:ff:ff:ff:ff
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:af:9d:c2 brd ff:ff:ff:ff:ff:ff
inet 192.168.127.102/24 brd 192.168.127.255 scope global ens37
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:feaf:9dc2/64 scope link
valid_lft forever preferred_lft forever
[root@localhost: ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
link/ether 00:0c:29:af:9d:b8 brd ff:ff:ff:ff:ff:ff
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:af:9d:c2 brd ff:ff:ff:ff:ff:ff
inet 192.168.127.102/24 brd 192.168.127.255 scope global ens37
valid_lft forever preferred_lft forever
inet 192.168.127.100/32 scope global ens37
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:feaf:9dc2/64 scope link
valid_lft forever preferred_lft forever
backup端增加vip,日志如下
Sep 6 10:00:01 localhost systemd: Started Session 69 of user root.
Sep 6 10:00:01 localhost systemd: Starting Session 69 of user root.
Sep 6 10:00:47 localhost Keepalived_vrrp[6794]: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 6 10:00:48 localhost Keepalived_vrrp[6794]: VRRP_Instance(VI_1) Entering MASTER STATE
Sep 6 10:00:48 localhost Keepalived_vrrp[6794]: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 6 10:00:48 localhost Keepalived_vrrp[6794]: Sending gratuitous ARP on ens37 for 192.168.127.100
Sep 6 10:00:48 localhost Keepalived_vrrp[6794]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens37 for 192.168.127.100
Sep 6 10:00:48 localhost Keepalived_vrrp[6794]: Sending gratuitous ARP on ens37 for 192.168.127.100
Sep 6 10:00:48 localhost Keepalived_vrrp[6794]: Sending gratuitous ARP on ens37 for 192.168.127.100
Sep 6 10:00:48 localhost Keepalived_vrrp[6794]: Sending gratuitous ARP on ens37 for 192.168.127.100
Sep 6 10:00:48 localhost Keepalived_vrrp[6794]: Sending gratuitous ARP on ens37 for 192.168.127.100
Sep 6 10:00:53 localhost Keepalived_vrrp[6794]: Sending gratuitous ARP on ens37 for 192.168.127.100
Sep 6 10:00:53 localhost Keepalived_vrrp[6794]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens37 for 192.168.127.100
Sep 6 10:00:53 localhost Keepalived_vrrp[6794]: Sending gratuitous ARP on ens37 for 192.168.127.100
Sep 6 10:00:53 localhost Keepalived_vrrp[6794]: Sending gratuitous ARP on ens37 for 192.168.127.100
Sep 6 10:00:53 localhost Keepalived_vrrp[6794]: Sending gratuitous ARP on ens37 for 192.168.127.100
Sep 6 10:00:53 localhost Keepalived_vrrp[6794]: Sending gratuitous ARP on ens37 for 192.168.127.100
Sep 6 10:01:01 localhost systemd: Started Session 70 of user root.
Sep 6 10:01:01 localhost systemd: Starting Session 70 of user root.
可以看到已经进入master状态
测试4:启动master上的keepalived服务
backup机器立马恢复backup状态。
Sep 6 10:05:50 localhost Keepalived_vrrp[6794]: VRRP_Instance(VI_1) Received advert with higher priority 100, ours 90
Sep 6 10:05:50 localhost Keepalived_vrrp[6794]: VRRP_Instance(VI_1) Entering BACKUP STATE
Sep 6 10:05:50 localhost Keepalived_vrrp[6794]: VRRP_Instance(VI_1) removing protocol VIPs.
扩展
来源:oschina
链接:https://my.oschina.net/u/3866688/blog/1973382