I want to get a list of all the blobs in a Google Cloud Storage bucket using the Client Library for Python.
According to the documentation I should use the list_bl
I'm just going to leave this here. I'm not sure if the libraries have changes in the last 2 years since this answer was posted, but if you're using prefix, then for blob in bucket.list_blobs()
doesn't work right. It seems like getting blobs and getting prefixes are fundamentally different. And using pages with prefixes is confusing.
I found a post in a github issue (here). This works for me.
def list_gcs_directories(bucket, prefix):
# from https://github.com/GoogleCloudPlatform/google-cloud-python/issues/920
iterator = bucket.list_blobs(prefix=prefix, delimiter='/')
prefixes = set()
for page in iterator.pages:
print page, page.prefixes
prefixes.update(page.prefixes)
return prefixes
A different comment on the same issue suggested this:
def get_prefixes(bucket):
iterator = bucket.list_blobs(delimiter="/")
response = iterator._get_next_page_response()
return response['prefixes']
Which only gives you the prefixes if all of your results fit on a single page.