How to get the n next values of a generator in a list (python)

后端未结

关注

 5  1424

I have made a generator to read a file word by word and it works nicely.

def word_reader(file):
    for line in open(file):
        for p in line.split():


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  佛祖请我去吃肉        
                
              
                            
                2020-11-28 11:19
              
            
            
                                                                       
EDIT: Use itertools.islice.  The pattern below that I originally proposed is bad idea — it crashes when it yields less than n values, and this behaviour depends on subtle issues, so people reading such code are unlikely to understand it's precise semantics.


  There is also

[next(it) for _ in range(n)]

  
  which might(?) be clearer to people not familiar with itertools; but if you deal with iterators a lot, itertools is a worthy addition to your toolset.


What happens if next(it) was exhausted and raises StopIteration?

(i.e. when it had less than n values to yield)

When I wrote the above line a couple years ago, I probably thought a StopIteration will have the clever side effect of cleanly terminating the list comprehension.  But no, the whole comprehension will crash passing the StopIteration upwards.  (It'd exit cleanly only if the exception originated from the range(n) iterator.)

Which is probably not the behavior you want.

But it gets worse.  The following is supposed to be equivalent to the list comprehension (especially on Python 3):

list(next(it) for _ in range(n))


It isn't.  The inner part is shorthand for a generator function; list() knows it's done when it raises StopIteration anywhere.

=> This version copes safely when there aren't n values and returns a shorter list.  (Like itertools.islice().)

[Executions on: 2.7, 3.4]

But that's too going to change!  The fact a generator silently exits when any code inside it raises StopIteration is a known wart, addressed by PEP 479.  From Python 3.7 (or 3.5 with a future import) that's going to cause a RuntimeError instead of cleanly finishing the generator.  I.e. it'll become similar to the list comprehension's behaviour.
(Tested on a recent HEAD build)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘了有多久        
                
              
                            
                2020-11-28 11:24
              
            
            
                                                                       
Use itertools.islice:

list(itertools.islice(it, n))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  佛祖请我去吃肉        
                
              
                            
                2020-11-28 11:34
              
            
            
                                                                       
To get the first n values of a generator, you can use more_itertools.take.

If you plan to iterate over the words in chunks (eg. 100 at a time), you can use more_itertools.chunked (https://more-itertools.readthedocs.io/en/latest/api.html):

import more_itertools
for words in more_itertools.chunked(reader, n=100):
    # process 100 words

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  天命终不由人        
                
              
                            
                2020-11-28 11:42
              
            
            
                                                                       
for word, i in zip(word_reader(file), xrange(n)):
    ...

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  再見小時候        
                
              
                            
                2020-11-28 11:44
              
            
            
                                                                       
Use cytoolz.take.

>>> from cytoolz import take
>>> list(take(2, [10, 20, 30, 40, 50]))
[10, 20]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复