How does paging work in the list_blobs function in Google Cloud Storage Python Client Library

前端未结

关注

 3  742

I want to get a list of all the blobs in a Google Cloud Storage bucket using the Client Library for Python.

According to the documentation I should use the list_bl


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2021-02-20 02:46
              
            
            
                                                                       
I'm just going to leave this here.  I'm not sure if the libraries have changes in the last 2 years since this answer was posted, but if you're using prefix, then for blob in bucket.list_blobs() doesn't work right.  It seems like getting blobs and getting prefixes are fundamentally different.  And using pages with prefixes is confusing.

I found a post in a github issue (here).  This works for me.

def list_gcs_directories(bucket, prefix):
    # from https://github.com/GoogleCloudPlatform/google-cloud-python/issues/920
    iterator = bucket.list_blobs(prefix=prefix, delimiter='/')
    prefixes = set()
    for page in iterator.pages:
        print page, page.prefixes
        prefixes.update(page.prefixes)
    return prefixes


A different comment on the same issue suggested this:

def get_prefixes(bucket):
    iterator = bucket.list_blobs(delimiter="/")
    response = iterator._get_next_page_response()
    return response['prefixes']


Which only gives you the prefixes if all of your results fit on a single page.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  抹茶落季        
                
              
                            
                2021-02-20 02:48
              
            
            
                                                                       
list_blobs() does use paging, but you do not use page_token to achieve it. 

How It Works:

The way list_blobs() work is that it returns an iterator that iterates through all the results doing paging behind the scenes. So simply doing this will get you through all the results, fetching pages as needed:

for blob in bucket.list_blobs()
    print blob.name


The Documentation is Wrong/Misleading:

As of 04/26/2017 this is what the docs says:


  page_token (str) – (Optional) Opaque marker for the next “page” of
  blobs. If not passed, will return the first page of blobs.


This implies that the result will be a single page of results with page_token determining which page. This is not correct. The result iterator iterates through multiple pages. What page_token actually represents is which page the iterator should START at. It no page_token is provided it will start at the first page.

Helpful To Know:

max_results limits the total number of results returned by the iterator.

The iterator does expose pages if you need it:

for page in bucket.list_blobs().pages:
    for blob in page:
        print blob.name

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  遥遥无期        
                
              
                            
                2021-02-20 02:57
              
            
            
                                                                       
It was a bit confusing, but I found the answer here

https://googlecloudplatform.github.io/google-cloud-python/latest/iterators.html

You can iterate through the pages and call the items needed

iterator=self.bucket.list_blobs()        

self.get_files=[]        
for page in iterator.pages:
    print('    Page number: %d' % (iterator.page_number,))
    print('  Items in page: %d' % (page.num_items,))
    print('     First item: %r' % (next(page),))
    print('Items remaining: %d' % (page.remaining,))
    print('Next page token: %s' % (iterator.next_page_token,))        
    for f in page:
        self.get_files.append("gs://" + f.bucket.name + "/" + f.name)

print( "Found %d results" % (len( self.get_files))) 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复