How does paging work in the list_blobs function in Google Cloud Storage Python Client Library

前端 未结 3 1992
深忆病人
深忆病人 2021-02-20 02:42

I want to get a list of all the blobs in a Google Cloud Storage bucket using the Client Library for Python.

According to the documentation I should use the list_bl

3条回答
  •  死守一世寂寞
    2021-02-20 03:02

    I'm just going to leave this here. I'm not sure if the libraries have changes in the last 2 years since this answer was posted, but if you're using prefix, then for blob in bucket.list_blobs() doesn't work right. It seems like getting blobs and getting prefixes are fundamentally different. And using pages with prefixes is confusing.

    I found a post in a github issue (here). This works for me.

    def list_gcs_directories(bucket, prefix):
        # from https://github.com/GoogleCloudPlatform/google-cloud-python/issues/920
        iterator = bucket.list_blobs(prefix=prefix, delimiter='/')
        prefixes = set()
        for page in iterator.pages:
            print page, page.prefixes
            prefixes.update(page.prefixes)
        return prefixes
    

    A different comment on the same issue suggested this:

    def get_prefixes(bucket):
        iterator = bucket.list_blobs(delimiter="/")
        response = iterator._get_next_page_response()
        return response['prefixes']
    

    Which only gives you the prefixes if all of your results fit on a single page.

提交回复
热议问题