How to download the latest file of an S3 bucket using Boto3?

后端 未结 7 968
借酒劲吻你
借酒劲吻你 2021-01-11 23:22

The other questions I could find were refering to an older version of Boto. I would like to download the latest file of an S3 bucket. In the documentation I found that there

相关标签:
7条回答
  • 2021-01-11 23:51

    This is basically the same answer as helloV in the case you use Session as I'm doing.

    from boto3.session import Session
    import settings
    
    session = Session(aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
                              aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)
    s3 = session.resource("s3")
    
    get_last_modified = lambda obj: int(obj.last_modified.strftime('%s'))
    
    
    bckt = s3.Bucket("my_bucket")
    objs = [obj for obj in bckt.objects.all()]
    
    objs = [obj for obj in sorted(objs, key=get_last_modified)]
    last_added = objs[-1].key
    

    Having objs sorted allows you to quickly delete all files but the latest with

    for obj in objs[:-1]:
        s3.Object("my_bucket", obj.key).delete()
    
    0 讨论(0)
  • 2021-01-11 23:54

    This handles when there are more than 1000 objects in the s3 bucket. This is basically @SaadK answer without the for loop and using newer version for list_objects_v2.

    EDIT: Fixes issue @Timothée-Jeannin identified. Ensures that latest across all pages is identified.

    https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Paginator.ListObjectsV2

    import boto3
    
    def get_most_recent_s3_object(bucket_name, prefix):
        s3 = boto3.client('s3')
        paginator = s3.get_paginator( "list_objects_v2" )
        page_iterator = paginator.paginate(Bucket=bucket_name, Prefix=prefix)
        latest = None
        for page in page_iterator:
            if "Contents" in page:
                latest2 = max(page['Contents'], key=lambda x: x['LastModified'])
                if latest is None or latest2['LastModified'] > latest['LastModified']:
                    latest = latest2
        return latest
    
    latest = get_most_recent_s3_object(bucket_name, prefix)
    
    latest['Key']  # -->   'prefix/objectname'
    
    0 讨论(0)
  • 2021-01-11 23:57

    I also wanted to download latest file from s3 bucket but located in a specific folder. Use following function to get latest filename using bucket name and prefix (which is folder name).

    import boto3
    
    def get_latest_file_name(bucket_name,prefix):
        """
        Return the latest file name in an S3 bucket folder.
    
        :param bucket: Name of the S3 bucket.
        :param prefix: Only fetch keys that start with this prefix (folder  name).
        """
        s3_client = boto3.client('s3')
        objs = s3_client.list_objects_v2(Bucket=bucket_name)['Contents']
        shortlisted_files = dict()            
        for obj in objs:
            key = obj['Key']
            timestamp = obj['LastModified']
            # if key starts with folder name retrieve that key
            if key.startswith(prefix):              
                # Adding a new key value pair
                shortlisted_files.update( {key : timestamp} )   
        latest_filename = max(shortlisted_files, key=shortlisted_files.get)
        return latest_filename
    
    latest_filename = get_latest_file_name(bucket_name='use_your_bucket_name',prefix = 'folder_name/')
    
    0 讨论(0)
  • 2021-01-12 00:06

    If you have a lot of files then you'll need to use pagination as mentioned by helloV. This is how I did it.

    get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
    s3 = boto3.client('s3')
    paginator = s3.get_paginator( "list_objects" )
    page_iterator = paginator.paginate( Bucket = "BucketName", Prefix = "Prefix")
    for page in page_iterator:
        if "Contents" in page:
            last_added = [obj['Key'] for obj in sorted( page["Contents"], key=get_last_modified)][-1]
    
    0 讨论(0)
  • 2021-01-12 00:07

    You should be able to download the latest version of the file using default download file command

    import boto3
    import botocore
    
    BUCKET_NAME = 'mytestbucket'
    KEY = 'fileinbucket.txt'
    
    s3 = boto3.resource('s3')
    
    try:
        s3.Bucket(BUCKET_NAME).download_file(KEY, 'downloadname.txt')
    except botocore.exceptions.ClientError as e:
        if e.response['Error']['Code'] == "404":
            print("The object does not exist.")
        else:
            raise
    

    Reference link

    To get the last modified or uploaded file you can use the following

    s3 = boto3.resource('s3')
    my_bucket = s3.Bucket('myBucket')
    unsorted = []
    for file in my_bucket.objects.filter():
       unsorted.append(file)
    
    files = [obj.key for obj in sorted(unsorted, key=get_last_modified, 
        reverse=True)][0:9]
    

    As answer in this reference link states, its not the optimal but it works.

    0 讨论(0)
  • 2021-01-12 00:12

    Variation of the answer I provided for: Boto3 S3, sort bucket by last modified. You can modify the code to suit to your needs.

    get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
    
    s3 = boto3.client('s3')
    objs = s3.list_objects_v2(Bucket='my_bucket')['Contents']
    last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified)][0]
    

    If you want to reverse the sort:

    [obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True)][0]
    
    0 讨论(0)
提交回复
热议问题