Length of a finite generator

前端未结

关注

 3  856

I have these two implementations to compute the length of a finite generator, while keeping the data for further processing:

def count_generator1(generator):


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  南笙        
                
              
                            
                2021-01-18 03:48
              
            
            
                                                                       
If you have to do this, the first method is much better - as you consume all the values, itertools.tee() will have to store all the values anyway, meaning a list will be more efficient.

To quote from the docs:


  This itertool may require significant auxiliary storage (depending on
  how much temporary data needs to be stored). In general, if one
  iterator uses most or all of the data before another iterator starts,
  it is faster to use list() instead of tee().

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘掉有多难        
                
              
                            
                2021-01-18 04:00
              
            
            
                                                                       
I ran Windows 64-bit Python 3.4.3 timeit on a few approaches I could think of:

>>> from timeit import timeit
>>> from textwrap import dedent as d
>>> timeit(
...     d("""
...     count = -1
...     for _ in s:
...         count += 1
...     count += 1
...     """),
...     "s = range(1000)",
... )
50.70772041983173
>>> timeit(
...     d("""
...     count = -1
...     for count, _ in enumerate(s):
...         pass
...     count += 1
...     """),
...     "s = range(1000)",
... )
42.636973504498656
>>> timeit(
...     d("""
...     count, _ = reduce(f, enumerate(range(1000)), (-1, -1))
...     count += 1
...     """),
...     d("""
...     from functools import reduce
...     def f(_, count):
...         return count
...     s = range(1000)
...     """),
... )
121.15513102540672
>>> timeit("count = sum(1 for _ in s)", "s = range(1000)")
58.179126025925825
>>> timeit("count = len(tuple(s))", "s = range(1000)")
19.777029680237774
>>> timeit("count = len(list(s))", "s = range(1000)")
18.145157531932
>>> timeit("count = len(list(1 for _ in s))", "s = range(1000)")
57.41422175998332


Shockingly, the fastest approach was to use a list (not even a tuple) to exhaust the iterator and get the length from there:

>>> timeit("count = len(list(s))", "s = range(1000)")
18.145157531932


Of course, this risks memory issues.  The best low-memory alternative was to use enumerate on a NOOP for-loop:

>>> timeit(
...     d("""
...     count = -1
...     for count, _ in enumerate(s):
...         pass
...     count += 1
...     """),
...     "s = range(1000)",
... )
42.636973504498656


Cheers!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北荒        
                
              
                            
                2021-01-18 04:03
              
            
            
                                                                       
If you don't need the length of the iterator prior to processing the data, you could use a helper method with a future to add in counting into the processing of your iterator/stream:
import asyncio
def ilen(iter):
    """
    Get future with length of iterator
    The future will hold the length once the iteartor is exhausted
    @returns: <iter, cnt-future>
    """
    def ilen_inner(iter, future):
        cnt = 0
        for row in iter:
            cnt += 1
            yield row
        future.set_result(cnt)
    cnt_future = asyncio.Future()
    return ilen_inner(iter, cnt_future), cnt_future

Usage would be:
data = db_connection.execute(query)
data, cnt = ilen(data)
solve_world_hunger(data)
print(f"Processed {cnt.result()} items")

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复