Delete all versions of an object in S3 using python?

夙愿已清 提交于 2020-12-01 09:41:05

问题


I have a versioned bucket and would like to delete the object (and all of its versions) from the bucket. However, when I try to delete the object from the console, S3 simply adds a delete marker but does not perform a hard delete.

Is it possible to delete all versions of the object (hard delete) with a particular key?:

s3resource = boto3.resource('s3')
bucket = s3resource.Bucket('my_bucket')
obj = bucket.Object('my_object_key')

# I would like to delete all versions for the object like so:
obj.delete_all_versions()

# or delete all versions for all objects like so:
bucket.objects.delete_all_versions()

回答1:


As a supplement to @jarmod's answer, here is a way I developed a workaround to "hard deleting" an object (with delete markered objects included);

def get_all_versions(bucket, filename):
    s3 = boto3.client('s3')
    keys = ["Versions", "DeleteMarkers"]
    results = []
    for k in keys:
        response = s3.list_object_versions(Bucket=bucket)[k]
        to_delete = [r["VersionId"] for r in response if r["Key"] == filename]
    results.extend(to_delete)
    return results

bucket = "YOUR BUCKET NAME"
file = "YOUR FILE"

for version in get_all_versions(bucket, file):
    s3.delete_object(Bucket=bucket, Key=file, VersionId=version)



回答2:


The other answers delete objects individually. It is more efficient to use the delete_objects boto3 call and batch process your delete. See the code below for a function which collects all objects and deletes in batches of 1000:

bucket = 'bucket-name'
s3_client = boto3.client('s3')
object_response_paginator = s3_client.get_paginator('list_object_versions')

delete_marker_list = []
version_list = []

for object_response_itr in object_response_paginator.paginate(Bucket=bucket):
    if 'DeleteMarkers' in object_response_itr:
        for delete_marker in object_response_itr['DeleteMarkers']:
            delete_marker_list.append({'Key': delete_marker['Key'], 'VersionId': delete_marker['VersionId']})

    if 'Versions' in object_response_itr:
        for version in object_response_itr['Versions']:
            version_list.append({'Key': version['Key'], 'VersionId': version['VersionId']})

for i in range(0, len(delete_marker_list), 1000):
    response = s3_client.delete_objects(
        Bucket=bucket,
        Delete={
            'Objects': delete_marker_list[i:i+1000],
            'Quiet': True
        }
    )
    print(response)

for i in range(0, len(version_list), 1000):
    response = s3_client.delete_objects(
        Bucket=bucket,
        Delete={
            'Objects': version_list[i:i+1000],
            'Quiet': True
        }
    )
    print(response)



回答3:


I had trouble using the other solutions to this question so here's mine.

import boto3
bucket = "bucket name goes here"
filename = "filename goes here"

client = boto3.client('s3')
paginator = client.get_paginator('list_object_versions')
response_iterator = paginator.paginate(Bucket=bucket)
for response in response_iterator:
    versions = response.get('Versions', [])
    versions.extend(response.get('DeleteMarkers', []))
    for version_id in [x['VersionId'] for x in versions
                       if x['Key'] == filename and x['VersionId'] != 'null']:
        print('Deleting {} version {}'.format(filename, version_id))
        client.delete_object(Bucket=bucket, Key=filename, VersionId=version_id)

This code deals with the cases where

  • object versioning isn't actually turned on
  • there are DeleteMarkers
  • there are no DeleteMarkers
  • there are more versions of a given file than fit in a single API response

Mahesh Mogal's answer doesn't delete DeleteMarkers. Mangohero1's answer fails if the object is missing a DeleteMarker. Hari's answer repeats 10 times (to workaround missing pagination logic).




回答4:


The documentation is helpful here:

  1. When versioning is enabled in an S3 bucket, a simple DeleteObject request cannot permanently delete an object from that bucket. Instead, Amazon S3 inserts a delete marker (which is effectively a new version of the object with its own version ID).
  2. When you try to GET an object whose current version is a delete marker, S3 behaves as if the object has been deleted (even though it has not) and returns a 404 error.
  3. To permanently delete an object from a versioned bucket, use DeleteObject, with the relevant version ID, for each and every version of the object (and that includes the delete markers).



回答5:


Fewer line solution.

import boto3

def delete_versions(bucket, objects=None): # `objects` is either list of str or None
  bucket = boto3.resource('s3').Bucket(bucket)
  if objects: # delete specified objects
    [version.delete() for version in bucket.object_versions.all() if version.object_key in objects]
  else: # or delete all objects in `bucket`
    [version.delete() for version in bucket.object_versions.all()]



回答6:


This post was super helpful without this we would have spent tremendous amount of time cleaning up our S3 folders.

We had a requirement to clean up specific folders only. So I tried the following code and it worked like a charm. Also note that I am iterating through the 10 times to delete more than 1000 objects limit that function has. Feel free to modify the limit as you wish.

import boto3
session = boto3.Session(aws_access_key_id='<YOUR ACCESS KEY>',aws_secret_access_key='<YOUR SECRET KEY>')

bucket_name = '<BUCKET NAME>'
object_name = '<KEY NAME>'

s3 = session.client('s3')

for i in range(10):
   versions = s3.list_object_versions (Bucket = bucket_name, Prefix = object_name)
#print (versions)
   version_list = versions.get('Versions')
   for version in version_list:
      keyName = version.get('Key')
      versionId = version.get('VersionId')
      print (keyName + ':' + versionId)
      s3.delete_object(Bucket = bucket_name, Key= keyName, VersionId = versionId)
   marker_list = versions.get('DeleteMarkers')
#print(marker_list)
   for marker in marker_list:
      keyName1 = marker.get('Key')
      versionId1 = marker.get('VersionId')
      print (keyName1 + ':' + versionId1)
      s3.delete_object(Bucket = bucket_name, Key= keyName1, VersionId = versionId1)



回答7:


You can delete an object with all of its versions using following code

session = boto3.Session(aws_access_key_id, aws_secret_access_key)

bucket_name = 'bucket_name'
object_name = 'object_name'

s3 = session.client('s3')

versions = s3.list_object_versions (Bucket = bucket_name, Prefix = object_name)
version_list = versions.get('Versions')
for version in version_list:
    versionId = version.get('VersionId')
    s3.delete_object(Bucket = bucket_name, Key= object_name, VersionId = versionId)



回答8:


this script will delete all version of all object with prefix -

s3 = boto3.resource("s3")
client = boto3.client("s3")
s3_bucket = s3.Bucket(bucket_name)
for obj in s3_bucket.objects.filter(Prefix=""):

    response = client.list_object_versions(Bucket=bucket_name, Prefix=obj.key)

    while "Versions" in response:
        to_delete = [
            {"Key": ver["Key"], "VersionId": ver["VersionId"]}
            for ver in response["Versions"]
        ]

        delete = {"Objects": to_delete}

        client.delete_objects(Bucket=bucket_name, Delete=delete)
        response = client.list_object_versions(Bucket=bucket_name, Prefix=obj.key)

    client.delete_object(Bucket=bucket_name, Key=obj.key)



回答9:


You can use object_versions.

def delete_all_versions(bucket_name: str, prefix: str):
    s3 = boto3.resource('s3')
    bucket = s3.Bucket(bucket_name)
    if prefix is None:
        bucket.object_versions.delete()
    else:
        bucket.object_versions.filter(Prefix=prefix).delete()

delete_all_versions("my_bucket", None) # empties the entire bucket
delete_all_versions("my_bucket", "my_prefix/") # deletes all objects matching the prefix (can be only one if only one matches)


来源:https://stackoverflow.com/questions/46819590/delete-all-versions-of-an-object-in-s3-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!