Cancel or undo deletion of Persistent Volumes in kubernetes cluster

前端 未结 5 2107
-上瘾入骨i
-上瘾入骨i 2021-02-14 01:00

Accidentally tried to delete all PV\'s in cluster but thankfully they still have PVC\'s that are bound to them so all PV\'s are stuck in Status: Terminating.

How can I g

相关标签:
5条回答
  • 2021-02-14 01:17

    You can check out this tool, it will update the Terminating PV's status in etcd back to Bound.

    The way it works has been mentioned by Anirudh Ramanathan in his answer.

    Be sure to back up your PV first.

    0 讨论(0)
  • 2021-02-14 01:18

    It is, in fact, possible to save data from your PersistentVolume with Status: Terminating and RetainPolicy set to default (delete). We have done so on GKE, not sure about AWS or Azure but I guess that they are similar

    We had the same problem and I will post our solution here in case somebody else has an issue like this.

    Your PersistenVolumes will not be terminated until there is a pod, deployment or to be more specific - a PersistentVolumeClaim using it.

    The steps we took to remedy our broken state:

    Once you are in the situation lke the OP, the first thing you want to do is to create a snapshot of your PersistentVolumes.

    In GKE console, go to Compute Engine -> Disks and find your volume there (use kubectl get pv | grep pvc-name) and create a snapshot of your volume.

    Use the snapshot to create a disk: gcloud compute disks create name-of-disk --size=10 --source-snapshot=name-of-snapshot --type=pd-standard --zone=your-zone

    At this point, stop the services using the volume and delete the volume and volume claim.

    Recreate the volume manually with the data from the disk:

    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: name-of-pv
    spec:
      accessModes:
        - ReadWriteOnce
      capacity:
        storage: 10Gi
      gcePersistentDisk:
        fsType: ext4
        pdName: name-of-disk
      persistentVolumeReclaimPolicy: Retain
    

    Now just update your volume claim to target a specific volume, the last line of the yaml file:

    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: my-pvc
      namespace: my-namespace
      labels:
        app: my-app
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      volumeName: name-of-pv
    
    0 讨论(0)
  • 2021-02-14 01:21

    Unfortunately, you can't save your PV's and data in this case. All you may do is recreate PV with Reclaim Policy: Retain - this will prevent data loss in the future. You can read more about reclaim Policies here and here.

    What happens if I delete a PersistentVolumeClaim (PVC)? If the volume was dynamically provisioned, then the default reclaim policy is set to “delete”. This means that, by default, when the PVC is deleted, the underlying PV and storage asset will also be deleted. If you want to retain the data stored on the volume, then you must change the reclaim policy from “delete” to “retain” after the PV is provisioned.

    0 讨论(0)
  • 2021-02-14 01:24

    Do not attempt this if you don't know what you're doing

    There is another fairly hacky way of undeleting PVs. Directly editing the objects in etcd. Note that the following steps work only if you have control over etcd - this may not be true on certain cloud providers or managed offerings. Also note that you can screw things up much worse easily; since objects in etcd were never meant to be edited directly - so please approach this with caution.

    We had a situation wherein our PVs had a policy of delete and I accidentally ran a command deleting a majority of them, on k8s 1.11. Thanks to storage-object-in-use protection, they did not immediately disappear, but they hung around in a dangerous state. Any deletion or restarts of the pods that were binding the PVCs would have caused the kubernetes.io/pvc-protection finalizer to get removed and thereby deletion of the underlying volume (in our case, EBS). New finalizers also cannot be added when the resource is in terminating state - From a k8s design standpoint, this is necessary in order to prevent race conditions.

    Below are the steps I followed:

    • Back up the storage volumes you care about. This is just to cover yourself against possible deletion - AWS, GCP, Azure all provide mechanisms to do this and create a new snapshot.
    • Access etcd directly - if it's running as a static pod, you can ssh into it and check the http serving port. By default, this is 4001. If you're running multiple etcd nodes, use any one.
    • Port-forward 4001 to your machine from the pod.
    kubectl -n=kube-system port-forward etcd-server-ip-x.y.z.w-compute.internal 4001:4001 
    
    • Use the REST API, or a tool like etcdkeeper to connect to the cluster.

    • Navigate to /registry/persistentvolumes/ and find the corresponding PVs. The deletion of resources by controllers in k8s is done by setting the .spec.deletionTimeStamp field in the controller spec. Delete this field in order to have the controllers stop trying to delete the PV. This will revert them to the Bound state, which is probably where they were before you ran a delete.

    • You can also carefully edit the reclaimPolicy to Retain and then save the objects back to etcd. The controllers will re-read the state soon and you should see it reflected in kubectl get pv output as well shortly.

    Your PVs should go back to the old undeleted state:

    $ kubectl get pv
    
    NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                                             STORAGECLASS                      REASON    AGE
    pvc-b5adexxx   5Gi        RWO            Retain           Bound     zookeeper/datadir-zoo-0                           gp2                                         287d
    pvc-b5ae9xxx   5Gi        RWO            Retain           Bound     zookeeper/datalogdir-zoo-0                        gp2                                         287d
    

    As a general best practice, it is best to use RBAC and the right persistent volume reclaim policy to prevent accidental deletion of PVs or the underlying storage.

    0 讨论(0)
  • 2021-02-14 01:30

    Edit: This only applies if you deleted the PVC and not the PV. Do not follow these instructions if you deleted the PV itself or the disk may be deleted!

    I found myself in this same situation due to a careless mistake. It was with a statefulset on Google Cloud/GKE. My PVC said terminating because the pod referencing it was still running and the PV was configured with a retain policy of Deleted. I ended up finding a simpler method to get everything straightened out that also preserved all of the extra Google/Kubernetes metadata and names.

    First, I would make a snapshot of your disk as suggested by another answer. You won't need it, but if something goes wrong, the other answer here can then be used to re-create a disk from it.

    The short version is that you just need reconfigure the PV to "Retain", allow the PVC to get deleted, then remove the previous claim from the PV. A new PVC can then be bound to it and all is well.

    Details:

    1. Find the full name of the PV:
        kubectl get pv
    
    1. Reconfigure your PV to set the reclaim policy to "Retain": (I'm doing this on Windows so you may need to handle the quotes differently depending on OS)
        kubectl patch pv <your-pv-name-goes-here> -p "{\"spec\":{\"persistentVolumeReclaimPolicy\":\"Retain\"}}"
    
    1. Verify that the status of the PV is now Retain.
    2. Shutdown your pod/statefulset (and don't allow it to restart). Once that's finished, your PVC will get removed and the PV (and the disk it references!) will be left intact.
    3. Edit the PV:
        kubectl edit pv <your-pv-name-goes-here>
    
    1. In the editor, remove the entire "claimRef" section. Remove all of the lines from (and including) "claimRef:" until the next tag with the same indentation level. The lines to remove should look more or less like this:
          claimRef:
            apiVersion: v1
            kind: PersistentVolumeClaim
            name: my-app-pvc-my-app-0
            namespace: default
            resourceVersion: "1234567"
            uid: 12345678-1234-1234-1234-1234567890ab
    
    1. Save the changes and close the editor. Check the status of the PV and it should now show "Available".
    2. Now you can re-create your PVC exactly as you originally did. That should then find the now "Available" PV and bind itself to it. In my case, I have the PVC defined with my statefulset as a volumeClaimTemplate so all I had to do was "kubectl apply" my statefulset.
    0 讨论(0)
提交回复
热议问题