I have been trying to find a better command line tool for duplicating buckets than s3cmd. s3cmd
can duplicate buckets without having to download and upload eac
For adhoc solution use aws cli
to sync between buckets:
aws s3 sync
speed depends on:
- latency for an API call to S3 endpoint
- amount of API calls made in concurrent
To increase sync speed:
- run aws s3 sync
from an AWS instance (c3.large on FreeBSD is OK ;-) )
- update ~/.aws/config with:
-- max_concurrent_requests = 128
-- max_queue_size = 8096
with following config and instance type I was able to sync bucket (309GB, 72K files, us-east-1) within 474 seconds.
For more generic solution consider - AWS DataPipeLine or S3 cross-region replication.
I don't know of any other S3 command line tools but if nothing comes up here, it might be easiest to write your own.
Pick whatever language and Amazon SDK/Toolkit you prefer. Then you just need to list/retrieve the source bucket contents and copy each file (In parallel obviously)
Looking at the source for s3cmd-modification (and I admit I know nothing about python), it looks like they have not parallelised the bucket-to-bucket code but perhaps you could use the standard upload/download parallel code as a starting point to do this.
If you don't mind using the AWS console, you can:
It's still fairly slow, but you can leave it alone and let it do its thing.
I have tried cloning two buckets using the AWS web console, the s3cmd
and the AWS CLI. Although these methods works most of the time, they are painfully slow.
Then I found s3s3mirror
: a specialized tool for syncing two S3 buckets. It's multi-threaded and a lot faster than the other approaches I have tried. I quickly moved Giga-bytes of data from one AWS region to another.
Check it out at https://github.com/cobbzilla/s3s3mirror, or download a Docker container from https://registry.hub.docker.com/u/pmoust/s3s3mirror/
a simple aws s3 cp s3://[original-bucket] s3://[backup-bucket] --recursive
works well (assuming you have aws cli setup)
As this is about Google's first hit on this subject, adding extra information.
'Cyno' made a newer version of s3cmd-modification, which now supports parallel bucket-to-bucket syncing. Exactly what I was waiting for as well.
Pull request is at https://github.com/pcorliss/s3cmd-modification/pull/2, his version at https://github.com/pearltrees/s3cmd-modification