Move files between two AWS S3 buckets using boto3

后端 未结 7 1756
难免孤独
难免孤独 2020-12-01 07:52

I have to move files between one bucket to another with Python Boto API. (I need it to \"Cut\" the file from the first Bucket and \"Paste\" it in the second one). What is th

相关标签:
7条回答
  • 2020-12-01 07:57

    This is code I used to move files within sub-directories of a s3 bucket

    # =============================================================================
    # CODE TO MOVE FILES within subfolders in S3 BUCKET
    # =============================================================================
    
    from boto3.session import Session
    
    ACCESS_KEY = 'a_key'
    SECRET_KEY = 's_key'
    session = Session(aws_access_key_id=ACCESS_KEY,
                aws_secret_access_key=SECRET_KEY)
    s3 = session.resource('s3')#creating session of S3 as resource
    
    
    s3client = session.client('s3')
    
    resp_dw = s3client.list_objects(Bucket='main_bucket', Prefix='sub_folder/', Delimiter="/")
    
    forms2_dw = [x['Key'] for x in resp_dw['Contents'][1:]]#here we got all files list (max limit is 1000 at a time)
    reload_no = 0
    while len(forms2_dw) != 0 :
    
        #resp_dw = s3client.list_objects(Bucket='main_bucket', Prefix='sub_folder/', Delimiter="/")
        #with open('dw_bucket.json','w') as f:
        #    resp_dws =str(resp_dw)
           # f.write(json.dumps(resp_dws))
        #forms_dw = [x['Prefix'] for x in resp_dw['CommonPrefixes']] 
        #forms2_dw = [x['Key'] for x in resp_dw['Contents'][1:]]
        #forms2_dw[-1]
        total_files = len(forms2_dw)
        #i=0
        for i in range(total_files):
        #zip_filename='1819.zip'
            foldername = resp_dw['Contents'][1:][i]['LastModified'].strftime('%Y%m%d')#Put your logic here for folder name
            my_bcket   =  'main_bucket'
    
            my_file_old = resp_dw['Contents'][1:][i]['Key'] #file to be copied path
            zip_filename =my_file_old.split('/')[-1]
            subpath_nw='new_sub_folder/'+foldername+"/"+zip_filename #destination path
            my_file_new = subpath_nw
            # 
            print str(reload_no)+ ':::  copying from====:'+my_file_old+' to :====='+s3_archive_subpath_nw
            #print my_bcket+'/'+my_file_old 
    
            if zip_filename[-4:] == '.zip':
                s3.Object(my_bcket,my_file_new).copy_from(CopySource=my_bcket+'/'+my_file_old)
                s3.Object(my_bcket,my_file_old).delete()
    
                print str(i)+' files moved of '+str(total_files)
    
        resp_dw = s3client.list_objects(Bucket='main_bucket', Prefix='sub-folder/', Delimiter="/")
    
        forms2_dw = [x['Key'] for x in resp_dw['Contents'][1:]] 
        reload_no +=1 
    
    0 讨论(0)
  • 2020-12-01 08:05

    If you have 2 different buckets with different access credentials. Store the credentials accordingly in credentials and config files under ~/.aws folder.

    you can use the following to copy object from one bucket with different credentials and then save the object in the other bucket with different credentials:

    import boto3
    
    
    session_src = boto3.session.Session(profile_name=<source_profile_name>)
    source_s3_r = session_src.resource('s3')
    
    session_dest = boto3.session.Session(profile_name=<dest_profile_name>)
    dest_s3_r = session_dest.resource('s3')
    
    # create a reference to source image
    old_obj = source_s3_r.Object(<source_s3_bucket_name>, <prefix_path> + <key_name>)
    
    # create a reference for destination image
    new_obj = dest_s3_r.Object(<dest_s3_bucket_name>, old_obj.key)
    
    # upload the image to destination S3 object
    new_obj.put(Body=old_obj.get()['Body'].read())
    

    Both bucket do not need to have accessibility from each other in the ACL or the bucket policies.

    0 讨论(0)
  • 2020-12-01 08:09

    If you want to

    Create a copy of an object that is already stored in Amazon S3.

    then copy_object is the way to go in boto3.

    How I do it:

    import boto3
    
    aws_access_key_id = ""
    aws_secret_access_key = ""
    bucket_from = ""
    bucket_to = ""
    s3 = boto3.resource(
        's3',
        aws_access_key_id=aws_access_key_id,
        aws_secret_access_key=aws_secret_access_key
    )
    src = s3.Bucket(bucket_from)
    
    def move_files():
        for archive in src.objects.all():
            # filters on archive.key might be applied here
    
            s3.meta.client.copy_object(
                ACL='public-read',
                Bucket=bucket_to,
                CopySource={'Bucket': bucket_from, 'Key': archive.key},
                Key=archive.key
            )
    
    move_files()
    
    0 讨论(0)
  • 2020-12-01 08:12

    If you are using boto3 (the newer boto version) this is quite simple

    import boto3
    s3 = boto3.resource('s3')
    copy_source = {
        'Bucket': 'mybucket',
        'Key': 'mykey'
    }
    s3.meta.client.copy(copy_source, 'otherbucket', 'otherkey')
    

    (Docs)

    0 讨论(0)
  • 2020-12-01 08:17

    Bucket name must be string not bucket object. Below change worked for me

    for k in src.list():
        dst.copy_key(k.key, src.name, k.key)
    
    0 讨论(0)
  • 2020-12-01 08:19

    awscli does the job 30 times faster for me than boto coping and deleting each key. Probably due to multithreading in awscli. If you still want to run it from your python script without calling shell commands from it, you may try something like this:

    Install awscli python package:

    sudo pip install awscli
    

    And then it is as simple as this:

    import os
    if os.environ.get('LC_CTYPE', '') == 'UTF-8':
        os.environ['LC_CTYPE'] = 'en_US.UTF-8'
    
    from awscli.clidriver import create_clidriver
    driver = create_clidriver()
    driver.main('s3 mv source_bucket target_bucket --recursive'.split())
    
    0 讨论(0)
提交回复
热议问题