从容器里umount 一个/dev/rbd设备

旧城冷巷雨未停 提交于 2020-01-15 23:23:49

背景:

  (1) 我们的平台docker默认的挂载方式是MountFlags=slave, 该挂载方式的一个特性是:一旦某个container的以这种方式挂载后启动后,则host节点的信息变动,不会再同步到container里

        (2) 因为节点监控数据采集工具node-exporter, 需要挂载host节点的根目录,若以MountFlags=slave的方式挂载,会导致节点上的mount信息变动,不会同步到node-exporter.

  (3) 其中一个影响是一个statefulset若从节点node-1迁移到node-2,会先umount 该statefulset在node-1上的rbd设备,并mount到node-2上.但结果会报错:

  Warning  FailedMount            25m (x34 over 1h)  kubelet, node-2  Unable to mount volumes for pod "fluentd-0_openstack(b4179bc4-3776-11ea-862b-246e965469a8)": timeout expired waiting for volumes to attach/mount for pod "openstack"/"fluentd-0". list of unattached/unmounted volumes=[storage]

  初步判断是MountFlags=slave的挂载方式问题,即host节点上umount了该rbd设备,但node-exporter的container里的mount信息还在.所以,只需要在该container里umount掉该rbd设备,若该statefulset能够起来,这说明需要修docker默认的挂载方式.

方法:

以下例子中,我们把根目录挂载prometheus-polling-exporter中,statefulset的pod是fluentd, 从节点node-1迁移到node-2

(1) 查看node-1节点没有被mount的rbd设备,即rbd14在host上已经处于umount状态

[root@node-1 ~]# lsblk | grep rbd14
rbd14                                                                                        237:0    0   500G  0 disk 

(2) 进入prometheus-polling-exporter的pod中,查看rbd14状态,还处于mount状态,即host和container不同步

()[root@node-1 /]# mount | grep rbd14
/dev/rbd14 on /host/var/lib/kubelet/plugins/kubernetes.io/rbd/rbd/rbd-image-kubernetes-dynamic-pvc-144d567d-1aaa-11ea-a40d-0a580ae80250 type ext4 (rw,relatime,stripe=1024,data=ordered)

(3) node-1的message信息里报错

Jan 15 17:13:05 node-1 kubelet: E0115 17:13:05.330855   18003 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/rbd/rbd:kubernetes-dynamic-pvc-144d567d-1aaa-11ea-a40d-0a580ae80250\"" failed. No retries permitted until 2020-01-15 17:15:07.330793288 +0800 CST m=+196440.172890846 (durationBeforeRetry 2m2s). Error: "UnmountDevice failed for volume \"pvc-b54ea292-1aa5-11ea-ae82-246e96549cb8\" (UniqueName: \"kubernetes.io/rbd/rbd:kubernetes-dynamic-pvc-144d567d-1aaa-11ea-a40d-0a580ae80250\") on node \"node-1\" : rbd: failed to unmap device /dev/rbd14, error exit status 16, rbd output: rbd: sysfs write failed\nrbd: unmap failed: (16) Device or resource busy\n"

(4) 查看node-1节点上prometheus-polling-exporter的进程号,进程号是40567

[root@node-1 ~]# ps -ef | grep prometheus-polling-exporter
root     40567 40485  0 17:04 ?        00:00:01 prometheus-polling-exporter
root     46498  8212  0 18:28 pts/44   00:00:00 grep --color=auto prometheus-polling-exporter

(5) 通过nsenter命令进入该进程

[root@node-1 ~]# nsenter -t 40567 -m -p
()[root@node-1 /]# 

(6) 查看rbd14的mount信息,并umount,然后t退出

()[root@node-1 /]# mount | grep rbd14
/dev/rbd14 on /host/var/lib/kubelet/plugins/kubernetes.io/rbd/rbd/rbd-image-kubernetes-dynamic-pvc-144d567d-1aaa-11ea-a40d-0a580ae80250 type ext4 (rw,relatime,stripe=1024,data=ordered)
()[root@node-1 /]# umount /host/var/lib/kubelet/plugins/kubernetes.io/rbd/rbd/rbd-image-kubernetes-dynamic-pvc-144d567d-1aaa-11ea-a40d-0a580ae80250
()[root@node-1 /]# exit
logout

(7) 在node-1节点上执行rbd unmap操作

[root@node-1 ~]# rbd unmap /dev/rbd14

flunentd即可正常umount/mount pvc并起来.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!