Is it possible to copy all files from one S3 bucket to another with s3cmd?

前端未结

关注

 11  1714

I\'m pretty happy with s3cmd, but there is one issue: How to copy all files from one S3 bucket to another? Is it even possible?

EDIT: I\'ve found a way to copy files

相关标签:

11条回答

眼角桃花

2020-12-12 13:01
The answer with the most upvotes as I write this is this one:
```
s3cmd sync s3://from/this/bucket s3://to/this/bucket
```
It's a useful answer. But sometimes sync is not what you need (it deletes files, etc.). It took me a long time to figure out this non-scripting alternative to simply copy multiple files between buckets. (OK, in the case shown below it's not between buckets. It's between not-really-folders, but it works between buckets equally well.)
```
# Slightly verbose, slightly unintuitive, very useful:
s3cmd cp --recursive --exclude=* --include=file_prefix* s3://semarchy-inc/source1/ s3://semarchy-inc/target/
```
Explanation of the above command:
- –recursive
  In my mind, my requirement is not recursive. I simply want multiple files. But recursive in this context just tells s3cmd cp to handle multiple files. Great.
- –exclude
  It’s an odd way to think of the problem. Begin by recursively selecting all files. Next, exclude all files. Wait, what?
- –include
  Now we’re talking. Indicate the file prefix (or suffix or whatever pattern) that you want to include.
  s3://sourceBucket/ s3://targetBucket/
  This part is intuitive enough. Though technically it seems to violate the documented example from s3cmd help which indicates that a source object must be specified:
  s3cmd cp s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-12-12 13:02
AWS CLI seems to do the job perfectly, and has the bonus of being an officially supported tool.
```
aws s3 sync s3://mybucket s3://backup-mybucket
```
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2020-12-12 13:02

I needed to copy a very large bucket so I adapted the code in the question into a multi threaded version and put it up on GitHub.

https://github.com/paultuckey/s3-bucket-to-bucket-copy-py

0 讨论(0)
发布评论:

提交评论
- 加载中...

醉酒成梦

2020-12-12 13:02

I wrote a script that backs up an S3 bucket: https://github.com/roseperrone/aws-backup-rake-task

#!/usr/bin/env python
from boto.s3.connection import S3Connection
import re
import datetime
import sys
import time

def main():
    s3_ID = sys.argv[1]
    s3_key = sys.argv[2]
    src_bucket_name = sys.argv[3]
    num_backup_buckets = sys.argv[4]
    connection = S3Connection(s3_ID, s3_key)
    delete_oldest_backup_buckets(connection, num_backup_buckets)
    backup(connection, src_bucket_name)

def delete_oldest_backup_buckets(connection, num_backup_buckets):
    """Deletes the oldest backup buckets such that only the newest NUM_BACKUP_BUCKETS - 1 buckets remain."""
    buckets = connection.get_all_buckets() # returns a list of bucket objects
    num_buckets = len(buckets)

    backup_bucket_names = []
    for bucket in buckets:
        if (re.search('backup-' + r'\d{4}-\d{2}-\d{2}' , bucket.name)):
            backup_bucket_names.append(bucket.name)

    backup_bucket_names.sort(key=lambda x: datetime.datetime.strptime(x[len('backup-'):17], '%Y-%m-%d').date())

    # The buckets are sorted latest to earliest, so we want to keep the last NUM_BACKUP_BUCKETS - 1
    delete = len(backup_bucket_names) - (int(num_backup_buckets) - 1)
    if delete <= 0:
        return

    for i in range(0, delete):
        print 'Deleting the backup bucket, ' + backup_bucket_names[i]
        connection.delete_bucket(backup_bucket_names[i])

def backup(connection, src_bucket_name):
    now = datetime.datetime.now()
    # the month and day must be zero-filled
    new_backup_bucket_name = 'backup-' + str('%02d' % now.year) + '-' + str('%02d' % now.month) + '-' + str(now.day);
    print "Creating new bucket " + new_backup_bucket_name
    new_backup_bucket = connection.create_bucket(new_backup_bucket_name)
    copy_bucket(src_bucket_name, new_backup_bucket_name, connection)


def copy_bucket(src_bucket_name, dst_bucket_name, connection, maximum_keys = 100):
    src_bucket = connection.get_bucket(src_bucket_name);
    dst_bucket = connection.get_bucket(dst_bucket_name);

    result_marker = ''
    while True:
        keys = src_bucket.get_all_keys(max_keys = maximum_keys, marker = result_marker)

        for k in keys:
            print 'Copying ' + k.key + ' from ' + src_bucket_name + ' to ' + dst_bucket_name

            t0 = time.clock()
            dst_bucket.copy_key(k.key, src_bucket_name, k.key)
            print time.clock() - t0, ' seconds'

        if len(keys) < maximum_keys:
            print 'Done backing up.'
            break

        result_marker = keys[maximum_keys - 1].key

if  __name__ =='__main__':main()

I use this in a rake task (for a Rails app):

desc "Back up a file onto S3"
task :backup do
     S3ID = "*****"
     S3KEY = "*****"
     SRCBUCKET = "primary-mzgd"
     NUM_BACKUP_BUCKETS = 2

     Dir.chdir("#{Rails.root}/lib/tasks")
     system "./do_backup.py #{S3ID} #{S3KEY} #{SRCBUCKET} #{NUM_BACKUP_BUCKETS}"
end

0 讨论(0)

不要未来只要你来

2020-12-12 13:03
mdahlman's code didn't work for me but this command copies all the files in the bucket1 to a new folder (command also creates this new folder) in bucket 2.
```
cp --recursive --include=file_prefix* s3://bucket1/ s3://bucket2/new_folder_name/
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2020-12-12 13:05

s3cmd sync s3://from/this/bucket/ s3://to/this/bucket/

For available options, please use: $s3cmd --help

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页