问题
Accidentally tried to delete all PV's in cluster but thankfully they still have PVC's that are bound to them so all PV's are stuck in Status: Terminating.
How can I get the PV's out of the "terminating" status and back to a healthy state where it is "bound" to the pvc and is fully working?
The key here is that I don't want to lose any data and I want to make sure the volumes are functional and not at risk of being terminated if claim goes away.
Here are some details from a kubectl describe
on the PV.
$ kubectl describe pv persistent-vol-1
Finalizers: [kubernetes.io/pv-protection foregroundDeletion]
Status: Terminating (lasts 1h)
Claim: ns/application
Reclaim Policy: Delete
Here is the describe on the claim.
$ kubectl describe pvc application
Name: application
Namespace: ns
StorageClass: standard
Status: Bound
Volume: persistent-vol-1
回答1:
It is, in fact, possible to save data from your PersistentVolume
with Status: Terminating
and RetainPolicy
set to default (delete). We have done so on GKE, not sure about AWS or Azure but I guess that they are similar
We had the same problem and I will post our solution here in case somebody else has an issue like this.
Your PersistenVolumes
will not be terminated until there is a pod, deployment or to be more specific - a PersistentVolumeClaim
using it.
The steps we took to remedy our broken state:
Once you are in the situation lke the OP, the first thing you want to do is to create a snapshot of your PersistentVolumes
.
In GKE console, go to Compute Engine -> Disks
and find your volume there (use kubectl get pv | grep pvc-name
) and create a snapshot of your volume.
Use the snapshot to create a disk: gcloud compute disks create name-of-disk --size=10 --source-snapshot=name-of-snapshot --type=pd-standard --zone=your-zone
At this point, stop the services using the volume and delete the volume and volume claim.
Recreate the volume manually with the data from the disk:
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: name-of-pv
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
gcePersistentDisk:
fsType: ext4
pdName: name-of-disk
persistentVolumeReclaimPolicy: Retain
Now just update your volume claim to target a specific volume, the last line of the yaml file:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
namespace: my-namespace
labels:
app: my-app
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeName: name-of-pv
回答2:
I found myself in this same situation due to a careless mistake. It was with a statefulset on Google Cloud/GKE. My PVC said terminating because the pod referencing it was still running and the PV was configured with a retain policy of Deleted. I ended up finding a simpler method to get everything straightened out that also preserved all of the extra Google/Kubernetes metadata and names.
First, I would make a snapshot of your disk as suggested by another answer. You won't need it, but if something goes wrong, the other answer here can then be used to re-create a disk from it.
The short version is that you just need reconfigure the PV to "Retain", allow the PVC to get deleted, then remove the previous claim from the PV. A new PVC can then be bound to it and all is well.
Details:
- Find the full name of the PV:
kubectl get pv
- Reconfigure your PV to set the reclaim policy to "Retain": (I'm doing this on Windows so you may need to handle the quotes differently depending on OS)
kubectl patch pv <your-pv-name-goes-here> -p "{\"spec\":{\"persistentVolumeReclaimPolicy\":\"Retain\"}}"
- Verify that the status of the PV is now Retain.
- Shutdown your pod/statefulset (and don't allow it to restart). Once that's finished, your PVC will get removed and the PV (and the disk it references!) will be left intact.
- Edit the PV:
kubectl edit pv <your-pv-name-goes-here>
- In the editor, remove the entire "claimRef" section. Remove all of the lines from (and including) "claimRef:" until the next tag with the same indentation level. The lines to remove should look more or less like this:
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: my-app-pvc-my-app-0
namespace: default
resourceVersion: "1234567"
uid: 12345678-1234-1234-1234-1234567890ab
- Save the changes and close the editor. Check the status of the PV and it should now show "Available".
- Now you can re-create your PVC exactly as you originally did. That should then find the now "Available" PV and bind itself to it. In my case, I have the PVC defined with my statefulset as a volumeClaimTemplate so all I had to do was "kubectl apply" my statefulset.
回答3:
Do not attempt this if you don't know what you're doing
There is another fairly hacky way of undeleting PVs. Directly editing the objects in etcd. Note that the following steps work only if you have control over etcd - this may not be true on certain cloud providers or managed offerings. Also note that you can screw things up much worse easily; since objects in etcd were never meant to be edited directly - so please approach this with caution.
We had a situation wherein our PVs had a policy of delete
and I accidentally ran a command deleting a majority of them, on k8s 1.11. Thanks to storage-object-in-use protection, they did not immediately disappear, but they hung around in a dangerous state. Any deletion or restarts of the pods that were binding the PVCs would have caused the kubernetes.io/pvc-protection
finalizer to get removed and thereby deletion of the underlying volume (in our case, EBS). New finalizers also cannot be added when the resource is in terminating state - From a k8s design standpoint, this is necessary in order to prevent race conditions.
Below are the steps I followed:
- Back up the storage volumes you care about. This is just to cover yourself against possible deletion - AWS, GCP, Azure all provide mechanisms to do this and create a new snapshot.
- Access etcd directly - if it's running as a static pod, you can ssh into it and check the http serving port. By default, this is 4001. If you're running multiple etcd nodes, use any one.
- Port-forward 4001 to your machine from the pod.
kubectl -n=kube-system port-forward etcd-server-ip-x.y.z.w-compute.internal 4001:4001
Use the REST API, or a tool like etcdkeeper to connect to the cluster.
Navigate to
/registry/persistentvolumes/
and find the corresponding PVs. The deletion of resources by controllers in k8s is done by setting the.spec.deletionTimeStamp
field in the controller spec. Delete this field in order to have the controllers stop trying to delete the PV. This will revert them to theBound
state, which is probably where they were before you ran a delete.You can also carefully edit the reclaimPolicy to
Retain
and then save the objects back to etcd. The controllers will re-read the state soon and you should see it reflected inkubectl get pv
output as well shortly.
Your PVs should go back to the old undeleted state:
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-b5adexxx 5Gi RWO Retain Bound zookeeper/datadir-zoo-0 gp2 287d
pvc-b5ae9xxx 5Gi RWO Retain Bound zookeeper/datalogdir-zoo-0 gp2 287d
As a general best practice, it is best to use RBAC and the right persistent volume reclaim policy to prevent accidental deletion of PVs or the underlying storage.
回答4:
Unfortunately, you can't save your PV's and data in this case.
All you may do is recreate PV with Reclaim Policy: Retain
- this will prevent data loss in the future.
You can read more about reclaim Policies here and here.
What happens if I delete a PersistentVolumeClaim (PVC)? If the volume was dynamically provisioned, then the default reclaim policy is set to “delete”. This means that, by default, when the PVC is deleted, the underlying PV and storage asset will also be deleted. If you want to retain the data stored on the volume, then you must change the reclaim policy from “delete” to “retain” after the PV is provisioned.
来源:https://stackoverflow.com/questions/51585649/cancel-or-undo-deletion-of-persistent-volumes-in-kubernetes-cluster