Sometimes multipart uploads hang or don\'t complete for some reason. In that case you are stuck with orphaned parts that are tricky to remove. You can list them with:
You can alternatively use Minio Client aka mc It is Open Source and compatible with AWS S3.
To list all the incomplete upload on a associated bucket.
$ mc ls -I s3/mybucketname
To remove all incomplete uploads to a associated S3 bucket.
$ mc rm -I -r --force s3/mybucketname
I = incomplete r = recursive f = with force option
Hope it helps.
Disclaimer : I work for Minio.
Assuming you have your awscli
all setup and it'll output JSON you can use jq
to project the needed keys with:
BUCKETNAME=<xxx>
aws s3api list-multipart-uploads --bucket $BUCKETNAME \
| jq -r '.Uploads[] | "--key \"\(.Key)\" --upload-id \(.UploadId)"' \
| while read -r line; do
eval "aws s3api abort-multipart-upload --bucket $BUCKETNAME $line";
done
Here is my oneliner, that will abort ALL multipart uploads regardless of status, assuming that you don't have any spaces in your key / filename.
BUCKETNAME=<xxx>;aws s3api list-multipart-uploads --bucket $BUCKETNAME --query 'Uploads[].[Key, UploadId]' --output text | awk '{print "aws s3api abort-multipart-upload --upload-id "$2" --bucket $BUCKETNAME --key " $1 " & wait"}{}' | bash
If you are doing multipart uploading, you can do the cleanup form S3 Management console too.
a) Open your S3 bucket
b) Switch to Management Tab
c) Click Add Lifecycle Rule
d) Now type rule name on first step and check the Clean up incomplete multipart uploads checkbox. Now you an type the number of days to keep incomplete parts too.
That's it. You can see these steps in attached screen shot too.
You can set up lifecycle rules to automatically purge those after some amount of time. Here's a blog post demonstrating how to do it in the console:
https://aws.amazon.com/blogs/aws/s3-lifecycle-management-update-support-for-multipart-uploads-and-delete-markers/
To do this in boto3:
import boto3
s3 = boto3.client('s3')
try:
lifecycle = s3.get_bucket_lifecycle(Bucket='bucket')
except ClientError:
lifecycle = {'Rules': []}
lifecycle['Rules'].append({
'ID': 'PruneAbandonedMultipartUploads',
'Status': 'Enabled',
'Prefix': '',
'AbortIncompleteMultipartUpload': {
'DaysAfterInitiation': 7
}
})
s3.put_bucket_lifecycle(Bucket='bucket', LifecycleConfiguration=lifecycle)
Adding that configuration in the cli would be much the same:
$ aws s3api get-bucket-lifecycle --bucket bucket > lifecycle.json
# Edit the lifecycle, adding the same configuration as in the boto3 sample
$ aws s3api put-bucket-lifecycle --bucket bucket --lifecycle-configuration file://lifecycle.json
If you have no lifecycle policy on your bucket, get-bucket-lifecycle
will raise a ClientError
. A robust implementation would make sure the right error is returned.
A policy only with that configuration would look like so:
{
"Rules": [
{
"ID": "PruneAbandonedMultipartUpload",
"Status": "Enabled",
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 7
}
}
]
}