问题
I'm an aws python newbie and trying to account for total bucket size shown via metrics tab on UI vs calculating sizes one folder at a time in a give bucket. I tried to fetch it by setting an inventory configuration but it doesn't show what I'm looking for.
I have an s3 bucket names my_bucket with versioning enabled.
It has 100 Objects and 26 subfolders (will 100000+ objects in each subfolder and atleast two versions for each of the object)
WHAT I AM TRYING TO DO: Calculate and display total size including versions for each of the 180 subfolders.
A Size 1GB
B Size 10TB
.
.
.
Z Size 13TB
HOW I AM TRYING TO DO
Find a solution which combines
the profile based authentication from link one and use the bucket.object_versions
with the level one folder size calculation from link 2
while also taking into consideration the versions. (Link2 doesn't have versions)
Link1 https://stackoverflow.com/a/58125684/4590025
Link2 https://stackoverflow.com/a/58125684/4590025
import boto3
PROFILE = "my_profile"
BUCKET = "my_bucket"
session = boto3.Session(profile_name = PROFILE)
s3 = session.resource('s3')
bucket = s3.Bucket(BUCKET)
#bucket.object_versions.do_something_with_it
conn = boto3.client('s3')
top_level_folders = dict()
for key in conn.list_objects(Bucket='my_bucket')['Contents']:
folder = key['Key'].split('/')[0]
print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))
if folder in top_level_folders:
top_level_folders[folder] += key['Size']
else:
top_level_folders[folder] = key['Size']
for folder, size in top_level_folders.items():
print("Folder: %s, size: %d" % (folder, size))
I also referred to https://stackoverflow.com/a/48867829 and I'm not sure how to go about utilizing the two and Currently when I run it I get below error despite setting the session:
Traceback (most recent call last):
File ".\folder_size.py", line 17, in <module>
for key in conn.list_objects(Bucket='my_bucket')['Contents']:
File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\client.py", line 316, in _api_call
return self._make_api_call(operation_name, kwargs)
File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\client.py", line 622, in _make_api_call
operation_model, request_dict, request_context)
File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\client.py", line 641, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\endpoint.py", line 132, in _send_request
request = self.create_request(request_dict, operation_model)
File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\endpoint.py", line 116, in create_request
operation_name=operation_model.name)
File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\hooks.py", line 211, in _emit
response = handler(**kwargs)
File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\signers.py", line 90, in handler
return self.sign(operation_name, request)
File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\signers.py", line 160, in sign
auth.add_auth(request)
File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\auth.py", line 357, in add_auth
raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
PS C:\Users\ginger\test>
回答1:
The issue is that the program uses:
conn = boto3.client('s3')
This is ignoring the profile that was set earlier:
session = boto3.Session(profile_name = PROFILE)
Thus, if you want to create an S3 client with the profile, then it should use:
conn = session.client('s3')
To avoid the problem with pagination, you could use the resource method to retrieve all objects:
for object in bucket.objects.all():
folder = object.key.split('/')[0]
print("Key %s in folder %s. %d bytes" % (object.key, folder, object.size))
...
来源:https://stackoverflow.com/questions/65228942/finding-s3-buckets-level-1-prefix-sizes-while-including-versions-using-boto3-an