Faster s3 bucket duplication

前端 未结 7 888
傲寒
傲寒 2020-12-22 18:14

I have been trying to find a better command line tool for duplicating buckets than s3cmd. s3cmd can duplicate buckets without having to download and upload eac

相关标签:
7条回答
  • 2020-12-22 18:52

    For adhoc solution use aws cli to sync between buckets:

    aws s3 sync speed depends on:
    - latency for an API call to S3 endpoint
    - amount of API calls made in concurrent

    To increase sync speed:
    - run aws s3 sync from an AWS instance (c3.large on FreeBSD is OK ;-) )
    - update ~/.aws/config with:
    -- max_concurrent_requests = 128
    -- max_queue_size = 8096

    with following config and instance type I was able to sync bucket (309GB, 72K files, us-east-1) within 474 seconds.

    For more generic solution consider - AWS DataPipeLine or S3 cross-region replication.

    0 讨论(0)
  • 2020-12-22 18:59

    I don't know of any other S3 command line tools but if nothing comes up here, it might be easiest to write your own.

    Pick whatever language and Amazon SDK/Toolkit you prefer. Then you just need to list/retrieve the source bucket contents and copy each file (In parallel obviously)

    Looking at the source for s3cmd-modification (and I admit I know nothing about python), it looks like they have not parallelised the bucket-to-bucket code but perhaps you could use the standard upload/download parallel code as a starting point to do this.

    0 讨论(0)
  • 2020-12-22 19:06

    If you don't mind using the AWS console, you can:

    1. Select all of the files/folders in the first bucket
    2. Click Actions > Copy
    3. Create a new bucket and select it
    4. Click Actions > Paste

    It's still fairly slow, but you can leave it alone and let it do its thing.

    0 讨论(0)
  • 2020-12-22 19:09

    I have tried cloning two buckets using the AWS web console, the s3cmd and the AWS CLI. Although these methods works most of the time, they are painfully slow.

    Then I found s3s3mirror : a specialized tool for syncing two S3 buckets. It's multi-threaded and a lot faster than the other approaches I have tried. I quickly moved Giga-bytes of data from one AWS region to another.

    Check it out at https://github.com/cobbzilla/s3s3mirror, or download a Docker container from https://registry.hub.docker.com/u/pmoust/s3s3mirror/

    0 讨论(0)
  • 2020-12-22 19:09

    a simple aws s3 cp s3://[original-bucket] s3://[backup-bucket] --recursive works well (assuming you have aws cli setup)

    0 讨论(0)
  • 2020-12-22 19:11

    As this is about Google's first hit on this subject, adding extra information.

    'Cyno' made a newer version of s3cmd-modification, which now supports parallel bucket-to-bucket syncing. Exactly what I was waiting for as well.

    Pull request is at https://github.com/pcorliss/s3cmd-modification/pull/2, his version at https://github.com/pearltrees/s3cmd-modification

    0 讨论(0)
提交回复
热议问题