How to upload small files to Amazon S3 efficiently in Python

后端 未结 3 1928
星月不相逢
星月不相逢 2021-01-31 11:54

Recently, I need to implement a program to upload files resides in Amazon EC2 to S3 in Python as quickly as possible. And the size of files are 30KB.

I have tried some

相关标签:
3条回答
  • 2021-01-31 12:20

    I recently needed to upload about 5 TB of small files to AWS and reached full network bandwidth ~750Mbits (1 Gb connection per server) without problems by setting a higher "max_concurrent_request" value in the ~/.aws/config file.

    I further speeded up the process by starting multiple upload jobs via a bash for-loop und sending these jobs to different servers.

    I also tried python eg. s3-parallel-put, but i think this approach is way faster. Of course if the files are too small one should consider: Compressing --> upload to EBS /S3 and decompress there

    Here is some code that might help.

    $cat .aws/config 
    [default]
    region = eu-west-1
    output = text
    s3 =
        max_concurrent_requests = 100
    

    Than start multiple aws copy jobs, eg.:

    for folder in `ls`; do aws s3 cp $folder s3://<bucket>/$folder/whatever/; done
    
    0 讨论(0)
  • 2021-01-31 12:21

    Sample parallel upload times to Amazon S3 using the Python boto SDK are available here:

    • Parallel S3 Uploads Using Boto and Threads in Python

    Rather than writing the code yourself, you might also consider calling out to the AWS Command Line Interface (CLI), which can do uploads in parallel. It is also written in Python and uses boto.

    0 讨论(0)
  • I have the same problem as You. My solution was send the data to AWS SQS and then save them to S3 using AWS Lambda.

    So data flow looks: app -> SQS -> Lambda -> S3

    Entire process is asynchronous, but near real-time :)

    0 讨论(0)
提交回复
热议问题