Complete a multipart_upload with boto3?

前端未结

关注

 6  2253

Tried this:

import boto3
from boto3.s3.transfer import TransferConfig, S3Transfer
path = \"/temp/\"
fileName = \"bigFile.gz\" # this happens to be a 5.9 Gig


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  陌清茗        
                
              
                            
                2020-12-28 21:37
              
            
            
                                                                       
Your code was already correct. Indeed, a minimal example of a multipart upload just looks like this:

import boto3
s3 = boto3.client('s3')
s3.upload_file('my_big_local_file.txt', 'some_bucket', 'some_key')


You don't need to explicitly ask for a multipart upload, or use any of the lower-level functions in boto3 that relate to multipart uploads. Just call upload_file, and boto3 will automatically use a multipart upload if your file size is above a certain threshold (which defaults to 8MB).

You seem to have been confused by the fact that the end result in S3 wasn't visibly made up of multiple parts:


  Result: 5.9 gig file on s3. Doesn't seem to contain multiple parts.


... but this is the expected outcome. The whole point of the multipart upload API is to let you upload a single file over multiple HTTP requests and end up with a single object in S3.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北恋        
                
              
                            
                2020-12-28 21:44
              
            
            
                                                                       
Why not use just the copy option in boto3?

s3.copy(CopySource={
        'Bucket': sourceBucket,
        'Key': sourceKey}, 
    Bucket=targetBucket,
    Key=targetKey,
    ExtraArgs={'ACL': 'bucket-owner-full-control'})


There are details on how to initialise s3 object and obviously further options for the call available here boto3 docs.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  盖世英雄少女心        
                
              
                            
                2020-12-28 21:45
              
            
            
                                                                       
In your code snippet, clearly should be part -> part1 in the dictionary. Typically, you would have several parts (otherwise why use multi-part upload), and the 'Parts' list would contain an element for each part.

You may also be interested in the new pythonic interface to dealing with S3: http://s3fs.readthedocs.org/en/latest/
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  暖寄归人        
                
              
                            
                2020-12-28 21:45
              
            
            
                                                                       
Change Part to Part1

import boto3

bucket = 'bucket'
path = "/temp/"
fileName = "bigFile.gz"
key = 'key'

s3 = boto3.client('s3')

# Initiate the multipart upload and send the part(s)
mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
with open(path+fileName,'rb') as data:
    part1 = s3.upload_part(Bucket=bucket
                       , Key=key
                       , PartNumber=1
                       , UploadId=mpu['UploadId']
                       , Body=data)

# Next, we need to gather information about each part to complete
# the upload. Needed are the part number and ETag.
part_info = {
  'Parts': [
    {
        'PartNumber': 1,
        'ETag': part1['ETag']
    }
   ]
  }

# Now the upload works!
s3.complete_multipart_upload(Bucket=bucket
                         , Key=key
                         , UploadId=mpu['UploadId']
                         , MultipartUpload=part_info)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2020-12-28 21:49
              
            
            
                                                                       
I would advise you to use boto3.s3.transfer for this purpose.  Here is an example:

import boto3


def upload_file(filename):
    session = boto3.Session()
    s3_client = session.client("s3")

    try:
        print("Uploading file: {}".format(filename))

        tc = boto3.s3.transfer.TransferConfig()
        t = boto3.s3.transfer.S3Transfer(client=s3_client, config=tc)

        t.upload_file(filename, "my-bucket-name", "name-in-s3.dat")

    except Exception as e:
        print("Error uploading: {}".format(e))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  耶瑟儿～        
                
              
                            
                2020-12-28 21:51
              
            
            
                                                                       
As described in official boto3 documentation:

The AWS SDK for Python automatically manages retries and multipart and
non-multipart transfers.
The management operations are performed by using reasonable default
settings that are well-suited for most scenarios.

So all you need to do is just to set the desired multipart threshold value that will indicate the minimum file size for which the multipart upload will be automatically handled by Python SDK:
import boto3
from boto3.s3.transfer import TransferConfig

# Set the desired multipart threshold value (5GB)
GB = 1024 ** 3
config = TransferConfig(multipart_threshold=5*GB)

# Perform the transfer
s3 = boto3.client('s3')
s3.upload_file('FILE_NAME', 'BUCKET_NAME', 'OBJECT_NAME', Config=config)

Moreover, you can also use multithreading mechanism for multipart upload by setting max_concurrency:
# To consume less downstream bandwidth, decrease the maximum concurrency
config = TransferConfig(max_concurrency=5)

# Download an S3 object
s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME', Config=config)

And finally in case you want perform multipart upload in single thread just set use_threads=False:
# Disable thread use/transfer concurrency
config = TransferConfig(use_threads=False)

s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME', Config=config)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复