finding s3 bucket's level 1 prefix sizes while including versions using boto3 and python

馋奶兔 提交于 2020-12-15 05:10:53

问题


I'm an aws python newbie and trying to account for total bucket size shown via metrics tab on UI vs calculating sizes one folder at a time in a give bucket. I tried to fetch it by setting an inventory configuration but it doesn't show what I'm looking for.

I have an s3 bucket names my_bucket with versioning enabled.
It has 100 Objects and 26 subfolders (will 100000+ objects in each subfolder and atleast two versions for each of the object)

WHAT I AM TRYING TO DO: Calculate and display total size including versions for each of the 180 subfolders.

A  Size 1GB  
B  Size 10TB    
.  
.  
.  
Z Size 13TB

HOW I AM TRYING TO DO Find a solution which combines
the profile based authentication from link one and use the bucket.object_versions
with the level one folder size calculation from link 2
while also taking into consideration the versions. (Link2 doesn't have versions)

Link1 https://stackoverflow.com/a/58125684/4590025
Link2 https://stackoverflow.com/a/58125684/4590025

import boto3

PROFILE = "my_profile"
BUCKET = "my_bucket"

session = boto3.Session(profile_name = PROFILE)
s3 = session.resource('s3')
bucket = s3.Bucket(BUCKET)

#bucket.object_versions.do_something_with_it


conn = boto3.client('s3')

top_level_folders = dict()

for key in conn.list_objects(Bucket='my_bucket')['Contents']:

    folder = key['Key'].split('/')[0]
    print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))

    if folder in top_level_folders:
        top_level_folders[folder] += key['Size']
    else:
        top_level_folders[folder] = key['Size']


for folder, size in top_level_folders.items():
    print("Folder: %s, size: %d" % (folder, size))

I also referred to https://stackoverflow.com/a/48867829 and I'm not sure how to go about utilizing the two and Currently when I run it I get below error despite setting the session:

Traceback (most recent call last):
  File ".\folder_size.py", line 17, in <module>
    for key in conn.list_objects(Bucket='my_bucket')['Contents']:
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\client.py", line 316, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\client.py", line 622, in _make_api_call
    operation_model, request_dict, request_context)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\client.py", line 641, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\endpoint.py", line 132, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\endpoint.py", line 116, in create_request
    operation_name=operation_model.name)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\signers.py", line 160, in sign
    auth.add_auth(request)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\auth.py", line 357, in add_auth
    raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
PS C:\Users\ginger\test>

回答1:


The issue is that the program uses:

conn = boto3.client('s3')

This is ignoring the profile that was set earlier:

session = boto3.Session(profile_name = PROFILE)

Thus, if you want to create an S3 client with the profile, then it should use:

conn = session.client('s3')

To avoid the problem with pagination, you could use the resource method to retrieve all objects:

for object in bucket.objects.all():
    folder = object.key.split('/')[0]
    print("Key %s in folder %s. %d bytes" % (object.key, folder, object.size))
...


来源:https://stackoverflow.com/questions/65228942/finding-s3-buckets-level-1-prefix-sizes-while-including-versions-using-boto3-an

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!