How to list objects by extension from s3 api?

前端 未结 6 683
醉话见心
醉话见心 2021-01-11 16:08

Can i somehow search objects in S3 by extension, not only by prefix?

Here is what i have now:

ListObjectsResponse r = s3Client.ListObjects(new Amazon         


        
相关标签:
6条回答
  • 2021-01-11 16:19

    Because by using boto3 resource to get objects from S3, you can get satisfied result by using the returned file extension to filter what you want. Like this:

    import boto3
    s3 = boto3.resource('s3')
    my_bucket = s3.Bucket('my_bucket')
    files = my_bucket.objects.all()
    file_list = []
    for file in files:
        if file.key.endswith('.docx'):
             file_list.append(file.key)
    

    You can change the endswith string with what you want.

    0 讨论(0)
  • 2021-01-11 16:21

    You don't actually need a separate database to do this for you.

    S3 gives you the ability to list objects in a bucket with a certain prefix. Your dilemma is that the ".xls" extension is at the end of the file name, therefore, prefix search doesn't help you. However, when you put the file into the bucket, you can change the object name so that the prefix contains the file type (for example: XLS-myfile.xls). Then, you can use the S3 API listObjects and pass a prefix of "XLS".

    0 讨论(0)
  • 2021-01-11 16:27

    I'm iterating after fetching the file information. End result will be in dict

    import boto3
    
    s3 = boto3.resource('s3')
    
    bucket = s3.Bucket('bucket_name')
    
    #get all files information from buket
    files = bucket.objects.all()
    
    # create empty list for final information
    files_information = []
    
    # your known extensions list. we will compare file names with this list
    extensions = ['png', 'jpg', 'txt', 'docx']
    
    # Iterate throgh 'files', convert to dict. and add extension key.
    for file in files:
        if file.key[-3:] in extensions:
            files_information.append({'file_name' : file.key, 'extension' : file.key[-3:]})
        else:
            files_information.append({'file_name' : file.key, 'extension' : 'unknown'})
    
    
    print files_information
    
    0 讨论(0)
  • 2021-01-11 16:31

    While I do think the BEST answer is to use a database to keep track of your files for you, I also think its an incredible pain in the ass. I was working within python with boto3, and this is the solution I came up with.

    It's not elegant, but it will work. List all the files, and then filter it down to a list of the ones with the "suffix"/"extension" that you want in code.

    s3_client = boto3.client('s3')
    bucket = 'my-bucket'
    prefix = 'my-prefix/foo/bar'
    paginator = s3_client.get_paginator('list_objects_v2')
    response_iterator = paginator.paginate(Bucket=bucket, Prefix=prefix)
    
    file_names = []
    
    for response in response_iterator:
        for object_data in response['Contents']:
            key = object_data['Key']
            if key.endswith('.json'):
                file_names.append(key)
    
    print file_names
    
    0 讨论(0)
  • 2021-01-11 16:38

    You can easily list all the elements by extension, getting all the elements (including folders) and then filtering by key.endswith('...')

    import boto3
    
    s3 = boto3.resource('s3')
    bucket = s3.Bucket('your-route')
    
    # Data from S3 is also filtered by endswith from key property
    for _ in bucket.objects.filter(Prefix=test_dir):
       if _.key.endswith('.zicu'):
          print('Value of object: ', _.key)
    

    In this case I'm filtering each element with a Prefix (test_dir) and then showing just the elements with .zicu extension

    0 讨论(0)
  • 2021-01-11 16:39

    I don't believe this is possible with S3.

    The best solution is to 'index' S3 using a database (Sql Server, MySql, SimpleDB etc) and do your queries against that.

    0 讨论(0)
提交回复
热议问题