NFS HA架构部署(NFS + Pacemaker + Corosync + DRBD)

环境：kvm虚拟机2台

OS：CentOS7.6

Kernel: Linux 3.10.0-957.21.3.el7.x86_64

IP地址主机名

192.9.200.25 centos7v6a 节点一

192.9.200.26 centos7v6b 节点二

两台服务器上都有一个大小相同的硬盘 /dev/vdb

1、安装DRBD（Distributed Replicated Block Device，分布式复制块设备）

从官方下载源码：

# curl -O https://www.linbit.com/downloads/drbd/9.0/drbd-9.0.18-1.tar.gz\\\

# curl -O https://www.linbit.com/downloads/drbd/utils/drbd-utils-9.10.0.tar.gz

解压，编译，安装

# tar drbd-utils-9.10.0.tar.gz

# cd drbd-utils-9.10.0/

# yum -y install flex po4a

# ./configure --prefix=/opt/drbd-utils --without-83support --without-84support --with-pacemaker --with-rgmanager --with-bashcompletion --with-initdir

# make && make install

# tar xf drbd-9.0.18-1.tar.gz

# cd drbd-9.0.18-1/

# yum -y install kernel-devel

# make KDIR=/usr/src/kernels/3.10.0-957.21.3.el7.x86_64

# make install

# modprobe drbd 加载drbd模块

# lsmod |grep drbd 加快模块是否加载

# echo drbd >/etc/modules-load.d/drbd.conf 开机加载此模块

2、配置

配置hosts 文件：

# cat /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.9.200.25 centos7v6a

192.9.200.26 centos7v6b

配置drbd.conf 文件：

# cd /opt/drbd-utils/etc

# ls

bash_completion.d drbd.conf drbd.d xen

# cat drbd.conf

# You can find an example in /usr/share/doc/drbd.../drbd.conf.example

include "drbd.d/global_common.conf";

include "drbd.d/*.res";

# cat global_common.conf

global {

usage-count no;

}

common {

startup {

wfc-timeout 120;

degr-wfc-timeout 120;

}

disk {

resync-rate 10M;

}

net {

protocol C;

timeout 60;

connect-int 10;

ping-int 10;

}

# cat r0.res

resource r0 {

on centos7v6a {

device /dev/drbd1;

disk /dev/vdb1;

address 192.9.200.25:7789;

meta-disk internal;

}

on centos7v6b {

device /dev/drbd1;

disk /dev/vdb1;

address 192.9.200.26:7789;

meta-disk internal;

}

防火墙设置：

firewall-cmd --permanent --add-port=7789/tcp

firewall-cmd --reload

节点一，节点二都做同样配置。

接下来同步块数据

#drbdadm create-md r0 //建立 drbd resource

#drbdadm up r0 //启动 resource r0

#drbdadm primary --force r0 //设置节点一为主节点

两个节点都启动drbd.service

systemctl start drbd.service

systemctl enable drbd.service

这里会等待其他服务器也start，操作第一台后，马上启动第二台。

r0状态查看

# drbdadm dstate r0

UpToDate/UpToDate

创建DRBD文件系统，以下仅在节点一上操作

mkfs.xfs /dev/drbd1

注： drbd 设备只能在 Primary 端使用，为避免误操作，当机器重启后，默认都处于 Secondary 状态，如要使用drbd设备，需手动把其设置为Primary。primary正常时，在secondary上挂载drbd设备会出现下面的错误：

# mount /dev/drbd1 /share

mount: /dev/drbd1 写保护，将以只读方式挂载

mount: 将 /dev/drbd1 挂载到 /share 失败: 错误的介质类型

在从节点上进行挂载

1、将主节点drbd的状态变为从

umount /share

drbdadm secondary all

2、在从节点上进行挂载

drbdadm primary all

mount /dev/drbd1 /share

测试

节点一：

cp anaconda-ks.cfg /share/ks.cfg

umount /share

drbdadm secondary all

节点二：

drbdadm primary all

mount /dev/drbd1 /share

观察是否同步

#ls /share/

ks.cfg

注意：必须先umount，否则出错。

# drbdadm secondary all

r0: State change failed: (-12) Device is held open by someone

additional info from kernel:

/dev/drbd1 opened by mount (pid 13506) at 2019-07-01 08:48:03.408

Command 'drbdsetup secondary r0' terminated with exit code 11

3、Pacemaker 安装

节点一：192.9.200.25 centos7v6a

节点二：192.9.200.26 centos7v6b

安装 pacemaker 之前，需要ssh免密登录配置。

安装

[ALL] # yum install pacemaker pcs resource-agents

# firewall-cmd --permanent --add-service=high-availability success # firewall-cmd --reload success

pacemaker 安装的时候会自动corosync。

创建集群

启动 pcsd服务

[ALL] # systemctl start pcsd.service

[ALL] # systemctl enable pcsd.service

创建pcs 所需身份认证，安装pacemaker时，系统自动创建用于hacluster，需要给该用户设置密码

[ALL] # echo 123456789 | passwd --stdin hacluster

[ONE] # pcs cluster auth node1 node2 -u hacluster -p 123456789 --force

配置 corosync

在集群的一个节点上执行

[root@centos7v6a ~]# pcs cluster auth centos7v6a centos7v6b

Username: hacluster

Password:

centos7v6b: Authorized

centos7v6a: Authorized

在同一节点上使用pcs cluster setup来生成和同步corosync配置。现在创建一个集群并指定其节点，集群名的字符数不能超过15

[ONE] pcs cluster setup --force --name nfscluster centos7v6a centos7v6b

pcs工具将自动生成 corosync配置文件，如下:

# cat /etc/corosync/corosync.conf

totem {

version: 2

cluster_name: nfscluster

secauth: off

transport: udpu

}

nodelist {

node {

ring0_addr: centos7v6a

nodeid: 1

}

node {

ring0_addr: centos7v6b

nodeid: 2

}

quorum {

provider: corosync_votequorum

two_node: 1

}

logging {

to_logfile: yes

logfile: /var/log/cluster/corosync.log

to_syslog: yes

}

如果不使用pcs 生成corosync.conf 配置文件，则需要在每个节点上手动进行相同的配置。

启动集群

[ONE] # pcs cluster start --all

此命令行相当于：

# systemctl start corosync.service

# systemctl start pacemaker.service

注：pcs cluster start nodename (or --all) #启动某个节点（或者所有节点）

验证 corosync 安装

首先，使用corosync-cfgtool检查集群通信是否正常

# corosync-cfgtool -s

Printing ring status.

Local node ID 2

RING ID 0

id = 192.9.200.26

status = ring 0 active with no faults

接下来，检查成员资格（membership）和总裁（quorum）APIs

# corosync-cmapctl |grep member

runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0

runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.9.200.25)

runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1

runtime.totem.pg.mrp.srp.members.1.status (str) = joined

runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0

runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.9.200.26)

runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1

runtime.totem.pg.mrp.srp.members.2.status (str) = joined

# pcs status corosync

Membership information

----------------------

Nodeid Votes Name

1 1 centos7v6a

2 1 centos7v6b (local)

验证Pacemaker安装

1、必须的进程是否在运行

# ps axf| grep -v grep| grep pacemaker

16006 ? Ss 0:00 /usr/sbin/pacemakerd -f

16007 ? Ss 0:00 \_ /usr/libexec/pacemaker/cib

16008 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd

16009 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd

16010 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd

16011 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine

16012 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd

2、检查pcs status

# pcs status

Cluster name: nfscluster

WARNINGS:

No stonith devices and stonith-enabled is not false

Stack: corosync

Current DC: centos7v6a (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum

Last updated: Tue Jul 2 13:17:54 2019

Last change: Tue Jul 2 13:03:27 2019 by hacluster via crmd on centos7v6a

2 nodes configured

0 resources configured

Online: [ centos7v6a centos7v6b ]

No resources

Daemon Status:

corosync: active/enabled

pacemaker: active/disabled

pcsd: active/enabled

3、最后，确保corosync或pacemaker没有启动错误（除了与未配置STONITH相关的消息，此时此操作正常）

# journalctl -b | grep -i error

创建主动/被动（Active/Passive）集群

检查集群状态

1、查看当前集群状态

以简洁的方式显示当前集群状态

# pcs status

以XML格式显示当前集群状态

# pcs cluster cib

2、改变集群前，建议查看当前配置的合法性

# crm_verify -L -V

error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined

error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option

error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity

Errors found during check: config not valid

出现上面的错误，是因为默认情况下开启了STONITH。关闭它

pcs property set stonith-enabled=false

crm_verify -L

现在没有发现错误，表示配置合法了。

添加资源

设置浮动IP 为192.9.200.27，每隔30秒检查浮动IP的可用性

pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.9.200.27 cidr_netmask=24 op monitor interval=30s

ocf:heartbeat:IPaddr2. This tells Pacemaker three things about the resource you want to add:

The first field (ocf in this case) is the standard to which the resource script conforms and where to find it.
The second field (heartbeat in this case) is standard-specific; for OCF resources, it tells the cluster which OCF namespace the resource script is in.
The third field (IPaddr2 in this case) is the name of the resource script.

确定可用资源列表（ocf）：

pcs resource standards

lsb

ocf

service

systemd

确认资源的提供者（相当于ocf:heartbeat:IPaddr2 的heartbeat）：

# pcs resource providers

heartbeat

linbit

openstack

pacemaker

最后，确定 IPaddr2 部分内容

[root@centos7v6b ~]# pcs resource agents ocf:heartbeat

aliyun-vpc-move-ip

apache

aws-vpc-move-ip

awseip

awsvip

azure-lb

clvm

conntrackd

……

验证资源添加成功

[root@centos7v6b ~]# pcs status

Cluster name: nfscluster

Stack: corosync

Current DC: centos7v6a (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum

Last updated: Tue Jul 2 13:52:32 2019

Last change: Tue Jul 2 13:44:59 2019 by root via cibadmin on centos7v6a

2 nodes configured

1 resource configured

Online: [ centos7v6a centos7v6b ]

Full list of resources:

ClusterIP (ocf::heartbeat:IPaddr2): Started centos7v6a

Daemon Status:

corosync: active/enabled

pacemaker: active/disabled

pcsd: active/enabled

创建nfs集群

创建NfsServer资源

# pcs resource create NfsShare ocf:heartbeat:nfsserver nfs_ip=192.9.200.27

将VIP与nfs-server.service 绑定在同一服务器上

[root@centos7v6a ~]# pcs constraint colocation add NfsShare with ClusterIP INFINITY

[root@centos7v6a ~]# pcs constraint

Location Constraints:

Ordering Constraints:

Colocation Constraints:

NfsShare with ClusterIP (score:INFINITY)

Ticket Constraints:

[root@centos7v6a ~]# pcs status

Cluster name: nfscluster

Stack: corosync

Current DC: centos7v6b (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum

Last updated: Tue Jul 2 16:45:03 2019

Last change: Tue Jul 2 16:44:48 2019 by root via cibadmin on centos7v6a

2 nodes configured

2 resources configured

Online: [ centos7v6a centos7v6b ]

Full list of resources:

ClusterIP (ocf::heartbeat:IPaddr2): Started centos7v6a

NfsShare (ocf::heartbeat:nfsserver): Started centos7v6a

Daemon Status:

corosync: active/enabled

pacemaker: active/disabled

pcsd: active/enabled

设置资源的启动顺序

[root@centos7v6a ~]# pcs constraint order ClusterIP then NfsShare

Adding ClusterIP NfsShare (kind: Mandatory) (Options: first-action=start then-action=start)

[root@centos7v6a ~]# pcs constraint

Location Constraints:

Ordering Constraints:

start ClusterIP then start NfsShare (kind:Mandatory)

Colocation Constraints:

NfsShare with ClusterIP (score:INFINITY)

Ticket Constraints:

设置服务优先运行在哪台服务器上（值越大优先级越高）

# pcs constraint location NfsShare prefers centos7v6a=50

查看优先级

crm_simulate -sL

创建drbd设备集群

[root@centos7v6a ~]# pcs cluster cib drbd_cfg //生成 drbd_cfg 文件，这步是必须的

[root@centos7v6a ~]# pcs -f drbd_cfg resource create NfsData ocf:linbit:drbd drbd_resource=r0 op monitor interval=60s //使用 -f 选项，将修改保存在drbd_cfg

[root@centos7v6a ~]# pcs -f drbd_cfg resource master NfsDataClone NfsData master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

[root@centos7v6a ~]# pcs -f drbd_cfg resource show

ClusterIP(ocf::heartbeat:IPaddr2):Started centos7v6b

NfsShare(ocf::heartbeat:nfsserver):Started centos7v6b

Master/Slave Set: NfsDataClone [NfsData]

Stopped: [ centos7v6a centos7v6b ]

对所有更改感到满意后，可以通过将drbd_cfg文件推送到实时CIB中来一次提交所有更改。

[root@centos7v6a ~]# pcs cluster cib-push drbd_cfg --config

CIB updated[root@centos7v6a ~]# pcs status

Cluster name: nfscluster

Stack: corosync

Current DC: centos7v6b (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum

Last updated: Wed Jul 3 09:30:02 2019

Last change: Wed Jul 3 09:29:57 2019 by root via cibadmin on centos7v6a

2 nodes configured

4 resources configured

Online: [ centos7v6a centos7v6b ]

Full list of resources:

ClusterIP (ocf::heartbeat:IPaddr2): Started centos7v6b

NfsShare (ocf::heartbeat:nfsserver): Started centos7v6b

Master/Slave Set: NfsDataClone [NfsData]

Masters: [ centos7v6a ]

Slaves: [ centos7v6b ]

Daemon Status:

corosync: active/enabled

pacemaker: active/disabled

pcsd: active/enabled

注意：drbd设备的活动节点需要在VIP所在服务器，否则出现问题。

创建文件系统集群（自动的在主节点上挂载共享设备）

确保 drbd1 只在主节点上可用

[root@centos7v6a ~]# pcs cluster cib fs_cfg

[root@centos7v6a ~]# pcs -f fs_cfg resource create NfsFS Filesystem device="/dev/drbd1" directory="/share" fstype="xfs"

Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from 'Filesystem')

[root@centos7v6a ~]# pcs -f fs_cfg constraint colocation add NfsFS with NfsDataClone INFINITY with-rsc-role=Master

[root@centos7v6a ~]# pcs -f fs_cfg constraint order promote NfsDataClone then start NfsFS

Adding NfsDataClone NfsFS (kind: Mandatory) (Options: first-action=promote then-action=start)

让NFS服务与共享目录在同一服务器上

[root@centos7v6a ~]# pcs -f fs_cfg constraint colocation add NfsShare with NfsFS INFINITY

[root@centos7v6a ~]# pcs -f fs_cfg constraint order NfsFS then NfsShare

Adding NfsFS NfsShare (kind: Mandatory) (Options: first-action=start then-action=start)

使配置生效

[root@centos7v6a ~]# pcs cluster cib-push fs_cfg --config

CIB updated

测试集群故障转移

先查看pacemaker状态

[root@centos7v6a ~]# pcs status

Cluster name: nfscluster

Stack: corosync

Current DC: centos7v6b (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum

Last updated: Wed Jul 3 10:03:43 2019

Last change: Wed Jul 3 10:03:29 2019 by root via cibadmin on centos7v6a

2 nodes configured

5 resources configured

Online: [ centos7v6a centos7v6b ]

Full list of resources:

ClusterIP (ocf::heartbeat:IPaddr2): Started centos7v6b

NfsShare (ocf::heartbeat:nfsserver): Started centos7v6b

Master/Slave Set: NfsDataClone [NfsData]

Masters: [ centos7v6b ]

Slaves: [ centos7v6a ]

NfsFS (ocf::heartbeat:Filesystem): Started centos7v6b

Daemon Status:

corosync: active/enabled

pacemaker: active/disabled

pcsd: active/enabled

方法一：

使用 pcs cluster stop nodename 停止节点nodename 上的所有集群服务，以此来模拟故障转移 。

[root@centos7v6a ~]# pcs cluster stop centos7v6b //现在停止Master centos7v6b

centos7v6b: Stopping Cluster (pacemaker)...

centos7v6b: Stopping Cluster (corosync)...

[root@centos7v6a ~]# pcs status

Cluster name: nfscluster

Stack: corosync

Current DC: centos7v6a (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum

Last updated: Wed Jul 3 10:19:51 2019

Last change: Wed Jul 3 10:03:29 2019 by root via cibadmin on centos7v6a

2 nodes configured

5 resources configured

Online: [ centos7v6a ]

OFFLINE: [ centos7v6b ]

Full list of resources:

ClusterIP (ocf::heartbeat:IPaddr2): Started centos7v6a

NfsShare (ocf::heartbeat:nfsserver): Started centos7v6a

Master/Slave Set: NfsDataClone [NfsData]

Masters: [ centos7v6a ]

Stopped: [ centos7v6b ]

NfsFS (ocf::heartbeat:Filesystem): Started centos7v6a

Daemon Status:

corosync: active/enabled

pacemaker: active/disabled

pcsd: active/enabled

[root@centos7v6a ~]# df -Th

文件系统类型容量已用可用已用% 挂载点

/dev/mapper/centos-root xfs 47G 1.8G 46G 4% /

devtmpfs devtmpfs 908M 0 908M 0% /dev

tmpfs tmpfs 920M 54M 866M 6% /dev/shm

tmpfs tmpfs 920M 8.6M 911M 1% /run

tmpfs tmpfs 920M 0 920M 0% /sys/fs/cgroup

/dev/vda1 xfs 1014M 187M 828M 19% /boot

tmpfs tmpfs 184M 0 184M 0% /run/user/0

/dev/drbd1 xfs 2.0G 33M 2.0G 2% /share

故障转移完成。使用 pcs cluster start centos7v6b 恢复集群

方法二：

将节点置于待机（standby）模式。处于此状态的节点继续运行corosync和pacemaker，但不允许运行资源。这样也可以进行故障转移模拟

这个特性，在执行系统管理任务（例如更新群集资源使用的包）时，此功能特别有用。

[root@centos7v6a ~]# pcs cluster standby centos7v6b

[root@centos7v6a ~]# pcs status

Cluster name: nfscluster

Stack: corosync

Current DC: centos7v6a (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum

Last updated: Wed Jul 3 10:28:56 2019

Last change: Wed Jul 3 10:28:52 2019 by root via cibadmin on centos7v6a

2 nodes configured

5 resources configured

Node centos7v6b: standby

Online: [ centos7v6a ]

Full list of resources:

ClusterIP(ocf::heartbeat:IPaddr2):Started centos7v6a

NfsShare(ocf::heartbeat:nfsserver):Starting centos7v6a

Master/Slave Set: NfsDataClone [NfsData]

Masters: [ centos7v6a ]

Stopped: [ centos7v6b ]

NfsFS(ocf::heartbeat:Filesystem):Started centos7v6a

Daemon Status:

corosync: active/enabled

pacemaker: active/disabled

pcsd: active/enabled

[root@centos7v6a ~]# ip addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

valid_lft forever preferred_lft forever

inet6 ::1/128 scope host

valid_lft forever preferred_lft forever

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

link/ether 52:54:00:04:36:c2 brd ff:ff:ff:ff:ff:ff

inet 192.9.200.25/24 brd 192.9.200.255 scope global noprefixroute eth0

valid_lft forever preferred_lft forever

inet 192.9.200.27/24 brd 192.9.200.255 scope global secondary eth0

valid_lft forever preferred_lft forever

inet6 fe80::5054:ff:fe04:36c2/64 scope link

valid_lft forever preferred_lft forever

[root@centos7v6a ~]# df -Th

文件系统类型容量已用可用已用% 挂载点

/dev/mapper/centos-root xfs 47G 1.8G 46G 4% /

devtmpfs devtmpfs 908M 0 908M 0% /dev

tmpfs tmpfs 920M 54M 866M 6% /dev/shm

tmpfs tmpfs 920M 8.6M 911M 1% /run

tmpfs tmpfs 920M 0 920M 0% /sys/fs/cgroup

/dev/vda1 xfs 1014M 187M 828M 19% /boot

tmpfs tmpfs 184M 0 184M 0% /run/user/0

/dev/drbd1 xfs 2.0G 33M 2.0G 2% /share

故障转移完成。使用 pcs cluster unstandby centos7v6b 恢复集群

参考：

https://blog.csdn.net/levy_cui/article/details/53644810

https://docs.linbit.com/docs/users-guide-9.0/#p-work

https://www.centos.bz/2012/02/drbd-compile-install-deploy/

https://blog.51cto.com/koumm/1738795

https://clusterlabs.org/quickstart-redhat.html

https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/index.html

https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch07.html

https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_configure_the_cluster_for_the_drbd_device.html

来源：oschina

链接：https://my.oschina.net/u/3090649/blog/3068920

标签

pacemaker

Corosync

drbd