Faster s3 bucket duplication

前端未结

关注

 7  888

I have been trying to find a better command line tool for duplicating buckets than s3cmd. s3cmd can duplicate buckets without having to download and upload eac

相关标签:

7条回答

南笙

2020-12-22 18:52

For adhoc solution use aws cli to sync between buckets:

aws s3 sync speed depends on:
- latency for an API call to S3 endpoint
- amount of API calls made in concurrent

To increase sync speed:
- run aws s3 sync from an AWS instance (c3.large on FreeBSD is OK ;-) )
- update ~/.aws/config with:
-- max_concurrent_requests = 128
-- max_queue_size = 8096

with following config and instance type I was able to sync bucket (309GB, 72K files, us-east-1) within 474 seconds.

For more generic solution consider - AWS DataPipeLine or S3 cross-region replication.

0 讨论(0)
发布评论:

提交评论
- 加载中...
栀梦

2020-12-22 18:59

I don't know of any other S3 command line tools but if nothing comes up here, it might be easiest to write your own.

Pick whatever language and Amazon SDK/Toolkit you prefer. Then you just need to list/retrieve the source bucket contents and copy each file (In parallel obviously)

Looking at the source for s3cmd-modification (and I admit I know nothing about python), it looks like they have not parallelised the bucket-to-bucket code but perhaps you could use the standard upload/download parallel code as a starting point to do this.

0 讨论(0)
发布评论:

提交评论
- 加载中...
春和景丽

2020-12-22 19:06
If you don't mind using the AWS console, you can:
1. Select all of the files/folders in the first bucket
2. Click Actions > Copy
3. Create a new bucket and select it
4. Click Actions > Paste
It's still fairly slow, but you can leave it alone and let it do its thing.
0 讨论(0)
发布评论:

提交评论
- 加载中...
独厮守ぢ

2020-12-22 19:09

I have tried cloning two buckets using the AWS web console, the s3cmd and the AWS CLI. Although these methods works most of the time, they are painfully slow.

Then I found s3s3mirror : a specialized tool for syncing two S3 buckets. It's multi-threaded and a lot faster than the other approaches I have tried. I quickly moved Giga-bytes of data from one AWS region to another.

Check it out at https://github.com/cobbzilla/s3s3mirror, or download a Docker container from https://registry.hub.docker.com/u/pmoust/s3s3mirror/

0 讨论(0)
发布评论:

提交评论
- 加载中...
野的像风

2020-12-22 19:09

a simple aws s3 cp s3://[original-bucket] s3://[backup-bucket] --recursive works well (assuming you have aws cli setup)

0 讨论(0)
发布评论:

提交评论
- 加载中...
时光说笑

2020-12-22 19:11

As this is about Google's first hit on this subject, adding extra information.

'Cyno' made a newer version of s3cmd-modification, which now supports parallel bucket-to-bucket syncing. Exactly what I was waiting for as well.

Pull request is at https://github.com/pcorliss/s3cmd-modification/pull/2, his version at https://github.com/pearltrees/s3cmd-modification

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页