什么是keepalived?
Keepalived是一个轻量级的高可用解决方案,主要用于Linux系统。它的主要功能是通过虚拟路由冗余协议(VRRP)实现高可用性,确保服务的持续运行,避免单点故障。Keepalived最初是为LVS(Linux Virtual Server)设计的,用于监控集群系统中各个服务节点的状态。如果某个服务节点出现异常或故障,Keepalived会自动将其从集群系统中剔除,并在节点恢复正常后自动将其重新加入集群
工作原理
Keepalived通过VRRP协议实现高可用性。VRRP协议将多台功能相同的路由器组成一个小组,其中一台作为主设备(master),其余作为备份设备(backup)。Keepalived的核心模块负责启动和维护主进程,健康检查模块负责监测服务节点的状态,而VRRP模块则实现VRRP协议。当主设备出现故障时,备份设备会接管其职责,确保服务的连续性
keepalived核心概念
虚拟路由器(Virtual Router):
由一组物理路由器组成,对外表现为一台逻辑路由器,虚拟路由器拥有一个虚拟 IP 地址(通常作为客户端的默认网关)和一个虚拟 MAC 地址。
Master 路由器:
虚拟路由器中实际转发数据包的路由器,负责响应 ARP 请求,将虚拟 MAC 地址与虚拟 IP 地址绑定。
Backup 路由器:
处于备用状态的路由器,当 Master 路由器故障时,Backup 路由器会接管 Master 的职责。
优先级(Priority):
用于决定哪台路由器成为 Master。取值范围为 1 到 255,默认值为 100,数值越大优先级越高。
keepalived工作原理
初始化阶段:
路由器启动后,会根据配置的优先级加入 VRRP 组。如果组内没有 Master,优先级最高的路由器将成为 Master。
Master 选举:
如果有多台路由器同时启动,优先级最高的路由器成为 Master。如果优先级相同,则比较接口 IP 地址,IP 地址大的成为 Master。
心跳机制:
Master 路由器会定期发送 VRRP 心跳报文(Hello Message),通告自己的状态。
Backup 路由器通过接收心跳报文来检测 Master 的状态。
故障检测与切换:
如果 Backup 路由器在超时时间内未收到 Master 的心跳报文,会认为 Master 故障。Backup 路由器中优先级最高的设备将成为新的 Master,并接管虚拟 IP 和 MAC 地址。
抢占机制:
如果配置了抢占模式,当 Backup 路由器的优先级高于当前 Master 时,会主动抢占成为 Master。
如果未配置抢占模式,即使 Backup 路由器优先级更高,也不会抢占 Master。
什么是VRRP协议?
虚拟路由冗余协议(Virtual Router Redundancy Protocol,VRRP)是一种计算机网络协议,用于提高网络中默认网关的可靠性,防止因单个网关设备故障而导致网络中断。VRRP 通过将多台路由器虚拟成一台“虚拟路由器”,并在这些路由器之间进行故障转移,从而实现高可用性。
keepalived的安装使用
环境信息
IP
系统
作用
10.0.0.20
Ubuntu22.04
主节点:keepalived+nginx
10.0.0.21
Ubuntu22.04
备节点:keepalived+nginx
下载keepalived
主备都操作
[root@master ~]# apt update -y
[root@master ~]# apt install keepalived -y
修改keepalived的配置文件
如果没有文件的话,直接创建一个即可
主节点的配置文件
[root@master ~]# cat /etc/keepalived/keepalived.conf
#全局定义部分
global_defs {
route_id lb01 #每一个keepalived的名字,当前网络中唯一
}
#vrrp实例配置部分,用于配置VIP,设置主备
#VI_1 是vrrp实例名字,在同一对主备之间要一直,在当前keepalived软件中唯一
vrrp_instance VI_1 {
#主/备 MASTER主,BACKUP备
state MASTER
#指定当前系统网卡
interface ens33
#同一对主备之间要保持一致
virtual_router_id 51
#优先级,数字越大优先级越高,设备建议:主>备 100 50
priority 100
#心跳间隔,多久发送一次vrrp数据包,单位秒
advert_int 1
#授权与认证,保持默认即可,对数据包进行加密
authentication {
#认证类型
auth_type PASS
auth_pass 1111
}
#设置VIP
virtual_ipaddress {
#vip
10.0.0.3
}
}
备节点的配置文件
[root@master ~]# cat /etc/keepalived/keepalived.conf
#全局定义部分
global_defs {
route_id lb01 #每一个keepalived的名字,当前网络中唯一
}
#vrrp实例配置部分,用于配置VIP,设置主备
#VI_1 是vrrp实例名字,在同一对主备之间要一直,在当前keepalived软件中唯一
vrrp_instance VI_1 {
#主/备 MASTER主,BACKUP备
state BACKUP
#指定当前系统网卡
interface ens33
#同一对主备之间要保持一致
virtual_router_id 51
#优先级,数字越大优先级越高,设备建议:主>备 100 50
priority 50
#心跳间隔,多久发送一次vrrp数据包,单位秒
advert_int 1
#授权与认证,保持默认即可,对数据包进行加密
authentication {
#认证类型
auth_type PASS
auth_pass 1111
}
#设置VIP
virtual_ipaddress {
#vip
10.0.0.3
}
}
启动keepalived
主备节点都操作
[root@master ~]# systemctl start keepalived
#检查是否启动成功
[root@master ~]# systemctl status keepalived
● keepalived.service - Keepalive Daemon (LVS and VRRP)
Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2025-04-05 21:37:18 CST; 7min ago
Main PID: 1528 (keepalived)
Tasks: 2 (limit: 4519)
Memory: 1.9M
CPU: 66ms
CGroup: /system.slice/keepalived.service
├─1528 /usr/sbin/keepalived --dont-fork
└─1529 /usr/sbin/keepalived --dont-fork
Apr 05 21:37:18 master Keepalived[1528]: Command line: '/usr/sbin/keepalived' '--dont-fork'
Apr 05 21:37:18 master Keepalived[1528]: Configuration file /etc/keepalived/keepalived.conf
Apr 05 21:37:18 master Keepalived[1528]: (Line 3) Unknown keyword 'route_id'
Apr 05 21:37:18 master Keepalived[1528]: NOTICE: setting config option max_auto_priority should result in better keepalived performance
Apr 05 21:37:18 master Keepalived[1528]: Starting VRRP child process, pid=1529
Apr 05 21:37:18 master systemd[1]: keepalived.service: Got notification message from PID 1529, but reception only permitted for main PID 1528
Apr 05 21:37:18 master Keepalived[1528]: Startup complete
Apr 05 21:37:18 master systemd[1]: Started Keepalive Daemon (LVS and VRRP).
Apr 05 21:37:18 master Keepalived_vrrp[1529]: (VI_1) Entering BACKUP STATE (init)
Apr 05 21:37:22 master Keepalived_vrrp[1529]: (VI_1) Entering MASTER STATE
检查测试
我们通过ip addr或者hostname -I命令查看主节点上是有10.0.0.3这个VIP的
[root@master ~]# ip addr
1: lo:
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33:
link/ether 00:0c:29:b9:88:49 brd ff:ff:ff:ff:ff:ff
altname enp2s1
inet 10.0.0.20/24 brd 10.0.0.255 scope global ens33
valid_lft forever preferred_lft forever
inet 10.0.0.3/32 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:feb9:8849/64 scope link
valid_lft forever preferred_lft forever
[root@master ~]# hostname -I
10.0.0.20 10.0.0.3
那么备节点呢?
[root@master ~]# hostname -I
10.0.0.21
经过检查,我们的主节点是由VIP,而备节点上是没有的,我们来模拟一下故障发生主节点宕机,备节点接管,备节点上应该是有一个VIP10.0.0.3的。我们来测试一下。
主节点执行:
[root@master ~]# systemctl stop keepalived
[root@master ~]# hostname -I
10.0.0.20
备节点执行:
[root@master ~]# hostname -I
10.0.0.21 10.0.0.3
[root@master ~]#
ok,到此我们的keepalived算是搭建完成了
现在存在一个问题,当主节点恢复之后,VIP会在哪台机器上呢?来测试一下
主节点执行:
[root@master ~]# systemctl start keepalived
[root@master ~]# systemctl status keepalived
● keepalived.service - Keepalive Daemon (LVS and VRRP)
Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2025-04-05 21:59:37 CST; 6s ago
Main PID: 1560 (keepalived)
Tasks: 2 (limit: 4519)
Memory: 1.9M
CPU: 9ms
CGroup: /system.slice/keepalived.service
├─1560 /usr/sbin/keepalived --dont-fork
└─1561 /usr/sbin/keepalived --dont-fork
Apr 05 21:59:37 master Keepalived[1560]: Starting VRRP child process, pid=1561
Apr 05 21:59:37 master systemd[1]: keepalived.service: Got notification message from PID 1561, but reception only permitted for main PID 1560
Apr 05 21:59:37 master Keepalived[1560]: Startup complete
Apr 05 21:59:37 master systemd[1]: Started Keepalive Daemon (LVS and VRRP).
Apr 05 21:59:37 master Keepalived_vrrp[1561]: (VI_1) Entering BACKUP STATE (init)
Apr 05 21:59:38 master Keepalived_vrrp[1561]: (VI_1) received lower priority (50) advert from 10.0.0.21 - discarding
Apr 05 21:59:39 master Keepalived_vrrp[1561]: (VI_1) received lower priority (50) advert from 10.0.0.21 - discarding
Apr 05 21:59:40 master Keepalived_vrrp[1561]: (VI_1) received lower priority (50) advert from 10.0.0.21 - discarding
Apr 05 21:59:41 master Keepalived_vrrp[1561]: (VI_1) received lower priority (50) advert from 10.0.0.21 - discarding
Apr 05 21:59:41 master Keepalived_vrrp[1561]: (VI_1) Entering MASTER STATE
[root@master ~]# hostname -I
10.0.0.20 10.0.0.3
备节点执行:
[root@master ~]# hostname -I
10.0.0.21
经过上面的验证,我们发现当主节点恢复之后,VIP会漂移的主节点上,这个时候会存在一个问题,在生产环境中,当请求量大时,VIP漂移会导致网站有部分时间不可用,所以我们不想让它飘逸,怎么做呢?
keepalived的非抢占模式
keepalived默认的是抢占式,主节点挂了,备节点接管,主节点恢复了,就让主节点接管,但现在不希望主节点恢复之后让主接管。
配置方式:
设置两个节点的状态都是备,BACKUP
加入nopreempt字段
主节点配置文件
[root@master ~]# cat /etc/keepalived/keepalived.conf
#全局定义部分
global_defs {
route_id lb01 #每一个keepalived的名字,当前网络中唯一
}
#vrrp实例配置部分,用于配置VIP,设置主备
#VI_1 是vrrp实例名字,在同一对主备之间要一直,在当前keepalived软件中唯一
vrrp_instance VI_1 {
#主/备 MASTER主,BACKUP备,要大些
state BACKUP
nopreempt
#指定当前系统网卡
interface ens33
#同一对主备之间要保持一致
virtual_router_id 51
#优先级,数字越大优先级越高,设备建议:主>备 100 50
priority 100
#心跳间隔,多久发送一次vrrp数据包
advert_int 1
#授权与认证,保持默认即可,对数据包进行加密
authentication {
#认证类型
auth_type PASS
auth_pass 1111
}
#设置VIP
virtual_ipaddress {
#label设置了别名
#10.0.0.3 dev eth0 label eth0:0
10.0.0.3
}
}
备节点配置文件
[root@master ~]# cat /etc/keepalived/keepalived.conf
#全局定义部分
global_defs {
route_id lb01 #每一个keepalived的名字,当前网络中唯一
}
#vrrp实例配置部分,用于配置VIP,设置主备
#VI_1 是vrrp实例名字,在同一对主备之间要一直,在当前keepalived软件中唯一
vrrp_instance VI_1 {
#主/备 MASTER主,BACKUP备
state BACKUP
nopreempt
#指定当前系统网卡
interface ens33
#同一对主备之间要保持一致
virtual_router_id 51
#优先级,数字越大优先级越高,设备建议:主>备 100 50
priority 50
#心跳间隔,多久发送一次vrrp数据包,单位秒
advert_int 1
#授权与认证,保持默认即可,对数据包进行加密
authentication {
#认证类型
auth_type PASS
auth_pass 1111
}
#设置VIP
virtual_ipaddress {
#vip
10.0.0.3
}
}
检查测试
测试流程,我们将主节点的keepalived进程停掉,然后查看主、备节点的IP变化,然后我们将主节点的keepalived服务启动,再次查看主、备节点的IP变化
#主节点操作
[root@master ~]# systemctl stop keepalived
[root@master ~]# hostname -I
10.0.0.20
#备节点操作
[root@master ~]# hostname -I
10.0.0.21 10.0.0.3
#主节点操作
[root@master ~]# systemctl start keepalived
[root@master ~]# systemctl status keepalived
● keepalived.service - Keepalive Daemon (LVS and VRRP)
Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2025-04-05 22:13:14 CST; 4s ago
Main PID: 1586 (keepalived)
Tasks: 2 (limit: 4519)
Memory: 1.9M
CPU: 7ms
CGroup: /system.slice/keepalived.service
├─1586 /usr/sbin/keepalived --dont-fork
└─1587 /usr/sbin/keepalived --dont-fork
Apr 05 22:13:14 master Keepalived[1586]: WARNING - keepalived was built for newer Linux 5.15.27, running on Linux 5.15.0-135-generic #146-Ubuntu SMP Sat Feb 15 17:06:22 UTC 2025
Apr 05 22:13:14 master Keepalived[1586]: Command line: '/usr/sbin/keepalived' '--dont-fork'
Apr 05 22:13:14 master Keepalived[1586]: Configuration file /etc/keepalived/keepalived.conf
Apr 05 22:13:14 master Keepalived[1586]: (Line 3) Unknown keyword 'route_id'
Apr 05 22:13:14 master Keepalived[1586]: NOTICE: setting config option max_auto_priority should result in better keepalived performance
Apr 05 22:13:14 master Keepalived[1586]: Starting VRRP child process, pid=1587
Apr 05 22:13:14 master Keepalived[1586]: Startup complete
Apr 05 22:13:14 master systemd[1]: keepalived.service: Got notification message from PID 1587, but reception only permitted for main PID 1586
Apr 05 22:13:14 master systemd[1]: Started Keepalive Daemon (LVS and VRRP).
Apr 05 22:13:14 master Keepalived_vrrp[1587]: (VI_1) Entering BACKUP STATE (init)
[root@master ~]# hostname -I
10.0.0.20
#备节点操作
[root@master ~]# hostname -I
10.0.0.21 10.0.0.3
经过上述的验证,我们发现这次VIP并没有在主节点恢复之后飘走,我们的非抢占模式搭建完成
基于keepalived实现nginx的高可用
高可用架构图示:
keepalived默认只会在服务器宕机,网络断开之后才会进行主备切换,默认情况下keepalived不会监控某一个服务的,这个时候我们应该怎么做呢?
我们可以定义一个脚本,监控某个服务,当服务关闭之后,停止当前系统上的keepalived就可以实现了。这里我们用nginx来实现,当然也可以搭配MySQL之类的服务进行使用
实现步骤
安装nginx和keepalived
两台服务器都安装nginx和keepalived,并且启动,nginx相关内容可以参考这篇文章-->《一文搞懂nginx》
编写脚本:
两台服务器都需要操作
[root@master ~]# cat check_nginx.sh
#!/bin/bash
nginx_count=`ss -lntup | grep 80 | wc -l`
if [ ${nginx_count} -eq 0 ];then
systemctl stop keepalived
fi
#添加执行权限
[root@master ~]# chmod +x check_nginx.sh
[root@master ~]# ll check_nginx.sh
-rwxr-xr-x 1 root root 122 Apr 5 22:24 check_nginx.sh*
修改keepalived配置文件,通过keepalived调用脚本
主节点配置文件
[root@master ~]# cat /etc/keepalived/keepalived.conf
#全局定义部分
global_defs {
route_id lb01 #每一个keepalived的名字,当前网络中唯一
}
#定义监控脚本
vrrp_script check_nginx.sh {
#脚本位置
script /root/check_nginx.sh
#多久执行一次
interval 2
#权重
weight 1
#脚本执行用户
user root
}
#vrrp实例配置部分,用于配置VIP,设置主备
#VI_1 是vrrp实例名字,在同一对主备之间要一直,在当前keepalived软件中唯一
vrrp_instance VI_1 {
#主/备 MASTER主,BACKUP备,要大些
state BACKUP
#设置非抢占模式
nopreempt
#指定当前系统网卡
interface ens33
#同一对主备之间要保持一致
virtual_router_id 51
#优先级,数字越大优先级越高,设备建议:主>备 100 50
priority 100
#心跳间隔,多久发送一次vrrp数据包
advert_int 1
#授权与认证,保持默认即可,对数据包进行加密
authentication {
#认证类型
auth_type PASS
auth_pass 1111
}
#设置VIP
virtual_ipaddress {
#label设置了别名
#10.0.0.3 dev eth0 label eth0:0
10.0.0.3
}
#这个vrrp实例 使用check_nginx.sh脚本
track_script {
check_nginx.sh
}
}
#重启keepalived
[root@master ~]# systemctl restart keepalived
[root@master ~]# systemctl status keepalived
● keepalived.service - Keepalive Daemon (LVS and VRRP)
Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2025-04-05 22:30:41 CST; 6s ago
Main PID: 1630 (keepalived)
Tasks: 2 (limit: 4519)
Memory: 2.9M
CPU: 46ms
CGroup: /system.slice/keepalived.service
├─1630 /usr/sbin/keepalived --dont-fork
└─1631 /usr/sbin/keepalived --dont-fork
Apr 05 22:30:41 master Keepalived[1630]: (Line 3) Unknown keyword 'route_id'
Apr 05 22:30:41 master Keepalived[1630]: NOTICE: setting config option max_auto_priority should result in better keepalived performance
Apr 05 22:30:41 master Keepalived[1630]: Starting VRRP child process, pid=1631
Apr 05 22:30:41 master Keepalived[1630]: Startup complete
Apr 05 22:30:41 master Keepalived_vrrp[1631]: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Apr 05 22:30:41 master systemd[1]: keepalived.service: Got notification message from PID 1631, but reception only permitted for main PID 1630
Apr 05 22:30:41 master systemd[1]: Started Keepalive Daemon (LVS and VRRP).
Apr 05 22:30:41 master Keepalived_vrrp[1631]: (VI_1) Entering BACKUP STATE (init)
Apr 05 22:30:42 master Keepalived_vrrp[1631]: VRRP_Script(check_nginx.sh) succeeded
Apr 05 22:30:42 master Keepalived_vrrp[1631]: (VI_1) Changing effective priority from 100 to 101
备节点配置文件
[root@master ~]# cat /etc/keepalived/keepalived.conf
#全局定义部分
global_defs {
route_id lb01 #每一个keepalived的名字,当前网络中唯一
}
#定义监控脚本
vrrp_script check_nginx.sh {
#脚本位置
script /root/check_nginx.sh
#多久执行一次
interval 2
#权重
weight 1
#脚本执行用户
user root
}
#vrrp实例配置部分,用于配置VIP,设置主备
#VI_1 是vrrp实例名字,在同一对主备之间要一直,在当前keepalived软件中唯一
vrrp_instance VI_1 {
#主/备 MASTER主,BACKUP备,要大些
state BACKUP
#设置非抢占模式
nopreempt
#指定当前系统网卡
interface ens33
#同一对主备之间要保持一致
virtual_router_id 51
#优先级,数字越大优先级越高,设备建议:主>备 100 50
priority 50
#心跳间隔,多久发送一次vrrp数据包
advert_int 1
#授权与认证,保持默认即可,对数据包进行加密
authentication {
#认证类型
auth_type PASS
auth_pass 1111
}
#设置VIP
virtual_ipaddress {
#label设置了别名
#10.0.0.3 dev eth0 label eth0:0
10.0.0.3
}
#这个vrrp实例 使用check_nginx.sh脚本
track_script {
check_nginx.sh
}
}
#重启keepalived
[root@master ~]# systemctl restart keepalived
[root@master ~]# systemctl status keepalived
● keepalived.service - Keepalive Daemon (LVS and VRRP)
Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2025-04-05 22:32:35 CST; 19s ago
Main PID: 17167 (keepalived)
Tasks: 2 (limit: 4519)
Memory: 2.1M
CPU: 110ms
CGroup: /system.slice/keepalived.service
├─17167 /usr/sbin/keepalived --dont-fork
└─17168 /usr/sbin/keepalived --dont-fork
Apr 05 22:32:35 master Keepalived[17167]: (Line 3) Unknown keyword 'route_id'
Apr 05 22:32:35 master Keepalived[17167]: NOTICE: setting config option max_auto_priority should result in better keepalived performance
Apr 05 22:32:35 master Keepalived[17167]: Starting VRRP child process, pid=17168
Apr 05 22:32:35 master Keepalived_vrrp[17168]: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Apr 05 22:32:35 master Keepalived[17167]: Startup complete
Apr 05 22:32:35 master systemd[1]: keepalived.service: Got notification message from PID 17168, but reception only permitted for main PID 17167
Apr 05 22:32:35 master systemd[1]: Started Keepalive Daemon (LVS and VRRP).
Apr 05 22:32:35 master Keepalived_vrrp[17168]: (VI_1) Entering BACKUP STATE (init)
Apr 05 22:32:35 master Keepalived_vrrp[17168]: VRRP_Script(check_nginx.sh) succeeded
Apr 05 22:32:35 master Keepalived_vrrp[17168]: (VI_1) Changing effective priority from 50 to 51
测试
可以停止nginx后查看VIP,在这里就不进行演示了,停止nginx命令systemctl stop nginx