NFS HA架构部署(NFS + Pacemaker + Corosync + DRBD)

孤街醉人 提交于 2021-01-11 09:59:17

NFS HA架构部署(NFS + Pacemaker + Corosync + DRBD)

 
环境:kvm虚拟机2台
OS:CentOS7.6
Kernel: Linux 3.10.0-957.21.3.el7.x86_64
IP地址      主机名
192.9.200.25   centos7v6a    节点一
192.9.200.26   centos7v6b    节点二
两台服务器上都有一个大小相同的硬盘 /dev/vdb
 
 
1、安装DRBD(Distributed Replicated Block Device,分布式复制块设备)
 
 

从官方下载源码:

# curl -O https://www.linbit.com/downloads/drbd/9.0/drbd-9.0.18-1.tar.gz\\\

# curl -O https://www.linbit.com/downloads/drbd/utils/drbd-utils-9.10.0.tar.gz

 

解压,编译,安装

# tar drbd-utils-9.10.0.tar.gz

# cd drbd-utils-9.10.0/

# yum -y install flex po4a

# ./configure --prefix=/opt/drbd-utils --without-83support --without-84support --with-pacemaker --with-rgmanager --with-bashcompletion --with-initdir

# make && make install

 

# tar xf drbd-9.0.18-1.tar.gz

# cd drbd-9.0.18-1/

# yum -y install kernel-devel

# make KDIR=/usr/src/kernels/3.10.0-957.21.3.el7.x86_64

# make install

 

# modprobe drbd 加载drbd模块

# lsmod |grep drbd 加快模块是否加载

# echo drbd >/etc/modules-load.d/drbd.conf 开机加载此模块

 
2、配置
 
配置hosts 文件:
 

# cat /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.9.200.25 centos7v6a

192.9.200.26 centos7v6b

 
配置drbd.conf 文件:
 

# cd /opt/drbd-utils/etc

 

# ls

bash_completion.d drbd.conf drbd.d xen

 

# cat drbd.conf

# You can find an example in /usr/share/doc/drbd.../drbd.conf.example

include "drbd.d/global_common.conf";

include "drbd.d/*.res";

 

# cat global_common.conf

global {

usage-count no;

}

 

common {

startup {

wfc-timeout 120;

degr-wfc-timeout 120;

}

 

disk {

resync-rate 10M;

}

 

net {

protocol C;

timeout 60;

  connect-int 10;

  ping-int 10;

}

}

 

# cat r0.res

resource r0 {

on centos7v6a {

device /dev/drbd1;

disk /dev/vdb1;

address 192.9.200.25:7789;

meta-disk internal;

}

 

on centos7v6b {

device /dev/drbd1;

disk /dev/vdb1;

address 192.9.200.26:7789;

meta-disk internal;

}

}

防火墙设置:
firewall-cmd --permanent --add-port=7789/tcp
firewall-cmd --reload 
 
节点一,节点二都做同样配置。
 
接下来同步块数据
 

#drbdadm create-md r0 //建立 drbd resource

#drbdadm up r0 //启动 resource r0

#drbdadm primary --force r0 //设置节点一为主节点

 

两个节点都启动drbd.service

systemctl start drbd.service

systemctl enable drbd.service

这里会等待其他服务器也start,操作第一台后,马上启动第二台。

 

r0状态查看

# drbdadm dstate r0

UpToDate/UpToDate

创建DRBD文件系统,以下仅在节点一上操作
mkfs.xfs  /dev/drbd1
注: drbd 设备只能在 Primary 端使用,为避免误操作,当机器重启后,默认都处于 Secondary 状态,如要使用drbd设备,需手动把其设置为Primary。primary正常时,在secondary上挂载drbd设备会出现下面的错误:
 

# mount /dev/drbd1 /share

mount: /dev/drbd1 写保护,将以只读方式挂载

mount: 将 /dev/drbd1 挂载到 /share 失败: 错误的介质类型

 
在从节点上进行挂载  
 
1、将主节点drbd的状态变为从  
umount /share
drbdadm secondary all  
 
2、在从节点上进行挂载  
drbdadm primary all  
mount  /dev/drbd1  /share
 
测试
 

节点一:

cp anaconda-ks.cfg /share/ks.cfg

umount /share

drbdadm secondary all

 

节点二:

drbdadm primary all

mount /dev/drbd1 /share

观察是否同步

#ls /share/

ks.cfg

注意:必须先umount,否则出错。
# drbdadm secondary all
r0: State change failed: (-12) Device is held open by someone
additional info from kernel:
/dev/drbd1 opened by mount (pid 13506) at 2019-07-01 08:48:03.408
Command 'drbdsetup secondary r0' terminated with exit code 11
 
 
3、Pacemaker 安装
 
节点一:192.9.200.25  centos7v6a
节点二:192.9.200.26  centos7v6b
 
安装 pacemaker 之前,需要ssh免密登录配置。
 
安装
 
[ALL] # yum install pacemaker pcs resource-agents

# firewall-cmd --permanent --add-service=high-availability success # firewall-cmd --reload success

pacemaker 安装的时候会自动corosync。
 
 
创建集群
 

启动 pcsd服务

[ALL] # systemctl start pcsd.service

[ALL] # systemctl enable pcsd.service

 

创建pcs 所需身份认证,安装pacemaker时,系统自动创建用于hacluster,需要给该用户设置密码

[ALL] # echo 123456789 | passwd --stdin hacluster

[ONE] # pcs cluster auth node1 node2 -u hacluster -p 123456789 --force

 

配置 corosync
 
在集群的一个节点上执行
 

[root@centos7v6a ~]# pcs cluster auth centos7v6a centos7v6b

Username: hacluster

Password:

centos7v6b: Authorized

centos7v6a: Authorized

 
在同一节点上使用pcs cluster setup来生成和同步corosync配置。现在创建一个集群并指定其节点,集群名的字符数不能超过15
[ONE] pcs cluster setup --force --name nfscluster centos7v6a centos7v6b 
pcs工具将自动生成 corosync配置文件,如下:
 

# cat /etc/corosync/corosync.conf

totem {

version: 2

cluster_name: nfscluster

secauth: off

transport: udpu

}

 

nodelist {

node {

ring0_addr: centos7v6a

nodeid: 1

}

 

node {

ring0_addr: centos7v6b

nodeid: 2

}

}

 

quorum {

provider: corosync_votequorum

two_node: 1

}

 

logging {

to_logfile: yes

logfile: /var/log/cluster/corosync.log

to_syslog: yes

}

如果不使用pcs 生成corosync.conf 配置文件,则需要在每个节点上手动进行相同的配置。
 
 
启动集群
[ONE] # pcs cluster start --all
此命令行相当于:
# systemctl start corosync.service
# systemctl start pacemaker.service
注:pcs cluster start nodename (or --all)  #启动某个节点(或者所有节点)
 
 
验证 corosync 安装
首先,使用corosync-cfgtool检查集群通信是否正常
 

# corosync-cfgtool -s

Printing ring status.

Local node ID 2

RING ID 0

id = 192.9.200.26

status = ring 0 active with no faults

接下来,检查成员资格(membership)和总裁(quorum)APIs
 

# corosync-cmapctl |grep member

runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0

runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.9.200.25)

runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1

runtime.totem.pg.mrp.srp.members.1.status (str) = joined

runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0

runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.9.200.26)

runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1

runtime.totem.pg.mrp.srp.members.2.status (str) = joined

# pcs status corosync

Membership information

----------------------

Nodeid Votes Name

1 1 centos7v6a

2 1 centos7v6b (local)

 
验证Pacemaker安装
 
1、必须的进程是否在运行
 

# ps axf| grep -v grep| grep pacemaker

16006 ? Ss 0:00 /usr/sbin/pacemakerd -f

16007 ? Ss 0:00 \_ /usr/libexec/pacemaker/cib

16008 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd

16009 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd

16010 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd

16011 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine

16012 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd

2、检查pcs status
 

# pcs status

Cluster name: nfscluster

 

WARNINGS:

No stonith devices and stonith-enabled is not false

 

Stack: corosync

Current DC: centos7v6a (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum

Last updated: Tue Jul 2 13:17:54 2019

Last change: Tue Jul 2 13:03:27 2019 by hacluster via crmd on centos7v6a

 

2 nodes configured

0 resources configured

 

Online: [ centos7v6a centos7v6b ]

 

No resources

 

 

Daemon Status:

corosync: active/enabled

pacemaker: active/disabled

pcsd: active/enabled

 
3、最后,确保corosync或pacemaker没有启动错误(除了与未配置STONITH相关的消息,此时此操作正常)
# journalctl -b | grep -i error
 
 
创建主动/被动(Active/Passive) 集群
 
 
检查集群状态
1、查看当前集群状态
 

以简洁的方式显示当前集群状态

# pcs status

以XML格式显示当前集群状态

# pcs cluster cib

2、改变集群前,建议查看当前配置的合法性
 

# crm_verify -L -V

error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined

error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option

error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity

Errors found during check: config not valid

出现上面的错误,是因为默认情况下开启了STONITH。关闭它
 

pcs property set stonith-enabled=false

crm_verify -L

现在没有发现错误,表示配置合法了。
 
添加资源
 
设置浮动IP 为192.9.200.27,每隔30秒检查浮动IP的可用性
 

pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.9.200.27 cidr_netmask=24 op monitor interval=30s

ocf:heartbeat:IPaddr2. This tells Pacemaker three things about the resource you want to add:
  • The first field (ocf in this case) is the standard to which the resource script conforms and where to find it.
  • The second field (heartbeat in this case) is standard-specific; for OCF resources, it tells the cluster which OCF namespace the resource script is in.
  • The third field (IPaddr2 in this case) is the name of the resource script.
确定可用资源列表(ocf):
 

pcs resource standards

lsb

ocf

service

systemd

确认资源的提供者(相当于ocf:heartbeat:IPaddr2 的heartbeat):
 

# pcs resource providers

heartbeat

linbit

openstack

pacemaker

最后,确定 IPaddr2 部分内容
 

[root@centos7v6b ~]# pcs resource agents ocf:heartbeat

aliyun-vpc-move-ip

apache

aws-vpc-move-ip

awseip

awsvip

azure-lb

clvm

conntrackd

……

验证资源添加成功
 

[root@centos7v6b ~]# pcs status

Cluster name: nfscluster

Stack: corosync

Current DC: centos7v6a (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum

Last updated: Tue Jul 2 13:52:32 2019

Last change: Tue Jul 2 13:44:59 2019 by root via cibadmin on centos7v6a

 

2 nodes configured

1 resource configured

 

Online: [ centos7v6a centos7v6b ]

 

Full list of resources:

 

ClusterIP (ocf::heartbeat:IPaddr2): Started centos7v6a

 

Daemon Status:

corosync: active/enabled

pacemaker: active/disabled

pcsd: active/enabled

创建nfs集群
 
创建NfsServer资源
 

# pcs resource create NfsShare ocf:heartbeat:nfsserver nfs_ip=192.9.200.27

将VIP与nfs-server.service 绑定在同一服务器上
 

[root@centos7v6a ~]# pcs constraint colocation add NfsShare with ClusterIP INFINITY

[root@centos7v6a ~]# pcs constraint

Location Constraints:

Ordering Constraints:

Colocation Constraints:

NfsShare with ClusterIP (score:INFINITY)

Ticket Constraints:

[root@centos7v6a ~]# pcs status

Cluster name: nfscluster

Stack: corosync

Current DC: centos7v6b (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum

Last updated: Tue Jul 2 16:45:03 2019

Last change: Tue Jul 2 16:44:48 2019 by root via cibadmin on centos7v6a

 

2 nodes configured

2 resources configured

 

Online: [ centos7v6a centos7v6b ]

 

Full list of resources:

 

ClusterIP (ocf::heartbeat:IPaddr2): Started centos7v6a

NfsShare (ocf::heartbeat:nfsserver): Started centos7v6a

 

Daemon Status:

corosync: active/enabled

pacemaker: active/disabled

pcsd: active/enabled

设置资源的启动顺序
 

[root@centos7v6a ~]# pcs constraint order ClusterIP then NfsShare

Adding ClusterIP NfsShare (kind: Mandatory) (Options: first-action=start then-action=start)

[root@centos7v6a ~]# pcs constraint

Location Constraints:

Ordering Constraints:

start ClusterIP then start NfsShare (kind:Mandatory)

Colocation Constraints:

NfsShare with ClusterIP (score:INFINITY)

Ticket Constraints:

设置服务优先运行在哪台服务器上(值越大优先级越高)
# pcs constraint location NfsShare prefers centos7v6a=50
查看优先级
crm_simulate -sL
 
 
创建drbd设备集群
 
 

[root@centos7v6a ~]# pcs cluster cib drbd_cfg //生成 drbd_cfg 文件,这步是必须的

[root@centos7v6a ~]# pcs -f drbd_cfg resource create NfsData ocf:linbit:drbd drbd_resource=r0 op monitor interval=60s //使用 -f 选项,将修改保存在drbd_cfg

[root@centos7v6a ~]# pcs -f drbd_cfg resource master NfsDataClone NfsData master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

[root@centos7v6a ~]# pcs -f drbd_cfg resource show

ClusterIP(ocf::heartbeat:IPaddr2):Started centos7v6b

NfsShare(ocf::heartbeat:nfsserver):Started centos7v6b

Master/Slave Set: NfsDataClone [NfsData]

Stopped: [ centos7v6a centos7v6b ]

对所有更改感到满意后,可以通过将drbd_cfg文件推送到实时CIB中来一次提交所有更改。

[root@centos7v6a ~]# pcs cluster cib-push drbd_cfg --config

CIB updated[root@centos7v6a ~]# pcs status

Cluster name: nfscluster

Stack: corosync

Current DC: centos7v6b (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum

Last updated: Wed Jul 3 09:30:02 2019

Last change: Wed Jul 3 09:29:57 2019 by root via cibadmin on centos7v6a

 

2 nodes configured

4 resources configured

 

Online: [ centos7v6a centos7v6b ]

 

Full list of resources:

 

ClusterIP (ocf::heartbeat:IPaddr2): Started centos7v6b

NfsShare (ocf::heartbeat:nfsserver): Started centos7v6b

Master/Slave Set: NfsDataClone [NfsData]

Masters: [ centos7v6a ]

Slaves: [ centos7v6b ]

 

Daemon Status:

corosync: active/enabled

pacemaker: active/disabled

pcsd: active/enabled

 
注意:drbd设备的活动节点需要在VIP所在服务器,否则出现问题。
 
 
创建文件系统集群(自动的在主节点上挂载共享设备)
 
 
确保 drbd1 只在主节点上可用
 

[root@centos7v6a ~]# pcs cluster cib fs_cfg

[root@centos7v6a ~]# pcs -f fs_cfg resource create NfsFS Filesystem device="/dev/drbd1" directory="/share" fstype="xfs"

Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from 'Filesystem')

[root@centos7v6a ~]# pcs -f fs_cfg constraint colocation add NfsFS with NfsDataClone INFINITY with-rsc-role=Master

[root@centos7v6a ~]# pcs -f fs_cfg constraint order promote NfsDataClone then start NfsFS

Adding NfsDataClone NfsFS (kind: Mandatory) (Options: first-action=promote then-action=start)

让NFS服务与共享目录在同一服务器上
 

[root@centos7v6a ~]# pcs -f fs_cfg constraint colocation add NfsShare with NfsFS INFINITY

[root@centos7v6a ~]# pcs -f fs_cfg constraint order NfsFS then NfsShare

Adding NfsFS NfsShare (kind: Mandatory) (Options: first-action=start then-action=start)

使配置生效
 

[root@centos7v6a ~]# pcs cluster cib-push fs_cfg --config

CIB updated

 
测试集群故障转移
 
 
先查看pacemaker状态
 

[root@centos7v6a ~]# pcs status

Cluster name: nfscluster

Stack: corosync

Current DC: centos7v6b (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum

Last updated: Wed Jul 3 10:03:43 2019

Last change: Wed Jul 3 10:03:29 2019 by root via cibadmin on centos7v6a

 

2 nodes configured

5 resources configured

 

Online: [ centos7v6a centos7v6b ]

 

Full list of resources:

 

ClusterIP (ocf::heartbeat:IPaddr2): Started centos7v6b

NfsShare (ocf::heartbeat:nfsserver): Started centos7v6b

Master/Slave Set: NfsDataClone [NfsData]

Masters: [ centos7v6b ]

Slaves: [ centos7v6a ]

NfsFS (ocf::heartbeat:Filesystem): Started centos7v6b

 

Daemon Status:

corosync: active/enabled

pacemaker: active/disabled

pcsd: active/enabled

 
方法一:
使用 pcs cluster stop nodename 停止节点nodename 上的所有集群服务,以此来模拟故障转移  。
 
 

[root@centos7v6a ~]# pcs cluster stop centos7v6b //现在停止Master centos7v6b

centos7v6b: Stopping Cluster (pacemaker)...

centos7v6b: Stopping Cluster (corosync)...

[root@centos7v6a ~]# pcs status

Cluster name: nfscluster

Stack: corosync

Current DC: centos7v6a (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum

Last updated: Wed Jul 3 10:19:51 2019

Last change: Wed Jul 3 10:03:29 2019 by root via cibadmin on centos7v6a

 

2 nodes configured

5 resources configured

 

Online: [ centos7v6a ]

OFFLINE: [ centos7v6b ]

 

Full list of resources:

 

ClusterIP (ocf::heartbeat:IPaddr2): Started centos7v6a

NfsShare (ocf::heartbeat:nfsserver): Started centos7v6a

Master/Slave Set: NfsDataClone [NfsData]

Masters: [ centos7v6a ]

Stopped: [ centos7v6b ]

NfsFS (ocf::heartbeat:Filesystem): Started centos7v6a

 

Daemon Status:

corosync: active/enabled

pacemaker: active/disabled

pcsd: active/enabled

[root@centos7v6a ~]# df -Th

文件系统 类型 容量 已用 可用 已用% 挂载点

/dev/mapper/centos-root xfs 47G 1.8G 46G 4% /

devtmpfs devtmpfs 908M 0 908M 0% /dev

tmpfs tmpfs 920M 54M 866M 6% /dev/shm

tmpfs tmpfs 920M 8.6M 911M 1% /run

tmpfs tmpfs 920M 0 920M 0% /sys/fs/cgroup

/dev/vda1 xfs 1014M 187M 828M 19% /boot

tmpfs tmpfs 184M 0 184M 0% /run/user/0

/dev/drbd1 xfs 2.0G 33M 2.0G 2% /share

故障转移完成。使用 pcs cluster start centos7v6b 恢复集群
 
 
方法二:
 
将节点置于待机(standby)模式。处于此状态的节点继续运行corosync和pacemaker,但不允许运行资源。这样也可以进行故障转移模拟
 
这个特性,在执行系统管理任务(例如更新群集资源使用的包)时,此功能特别有用。
 
 

[root@centos7v6a ~]# pcs cluster standby centos7v6b

[root@centos7v6a ~]# pcs status

Cluster name: nfscluster

Stack: corosync

Current DC: centos7v6a (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum

Last updated: Wed Jul 3 10:28:56 2019

Last change: Wed Jul 3 10:28:52 2019 by root via cibadmin on centos7v6a

 

2 nodes configured

5 resources configured

 

Node centos7v6b: standby

Online: [ centos7v6a ]

 

Full list of resources:

 

ClusterIP(ocf::heartbeat:IPaddr2):Started centos7v6a

NfsShare(ocf::heartbeat:nfsserver):Starting centos7v6a

Master/Slave Set: NfsDataClone [NfsData]

Masters: [ centos7v6a ]

Stopped: [ centos7v6b ]

NfsFS(ocf::heartbeat:Filesystem):Started centos7v6a

 

Daemon Status:

corosync: active/enabled

pacemaker: active/disabled

pcsd: active/enabled

[root@centos7v6a ~]# ip addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

valid_lft forever preferred_lft forever

inet6 ::1/128 scope host

valid_lft forever preferred_lft forever

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

link/ether 52:54:00:04:36:c2 brd ff:ff:ff:ff:ff:ff

inet 192.9.200.25/24 brd 192.9.200.255 scope global noprefixroute eth0

valid_lft forever preferred_lft forever

inet 192.9.200.27/24 brd 192.9.200.255 scope global secondary eth0

valid_lft forever preferred_lft forever

inet6 fe80::5054:ff:fe04:36c2/64 scope link

valid_lft forever preferred_lft forever

[root@centos7v6a ~]# df -Th

文件系统 类型 容量 已用 可用 已用% 挂载点

/dev/mapper/centos-root xfs 47G 1.8G 46G 4% /

devtmpfs devtmpfs 908M 0 908M 0% /dev

tmpfs tmpfs 920M 54M 866M 6% /dev/shm

tmpfs tmpfs 920M 8.6M 911M 1% /run

tmpfs tmpfs 920M 0 920M 0% /sys/fs/cgroup

/dev/vda1 xfs 1014M 187M 828M 19% /boot

tmpfs tmpfs 184M 0 184M 0% /run/user/0

/dev/drbd1 xfs 2.0G 33M 2.0G 2% /share

故障转移完成。使用 pcs cluster unstandby centos7v6b 恢复集群
 
 

参考:
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!