How can I parallelize a pipeline of generators/iterators in Python?

前端未结

关注

 2  1149

囚心锁ツ 2021-02-20 02:17

Suppose I have some Python code like the following:

input = open(\"input.txt\")
x = (process_line(line) for line in input)
y = (process_item(item) for item in x)


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   悲哀的现实
                                             
                
                
                (楼主)
            
              
              
                2021-02-20 02:45
              

            
            
                        

is there any easy way to make it so that multiple lines can be in the pipeline at once

I wrote a library to do just this: https://github.com/michalc/threaded-buffered-pipeline, that iterates over each iterable in a separate thread.
So what was
input = open("input.txt")

x = (process_line(line) for line in input)
y = (process_item(item) for item in x)
z = (generate_output_line(item) + "\n" for item in y)

output = open("output.txt", "w")
output.writelines(z)

becomes
from threaded_buffered_pipeline import buffered_pipeline

input = open("input.txt")

buffer_iterable = buffered_pipeline()
x = buffer_iterable((process_line(line) for line in input))
y = buffer_iterable((process_item(item) for item in x))
z = buffer_iterable((generate_output_line(item) + "\n" for item in y))

output = open("output.txt", "w")
output.writelines(z)

How much actual parallelism this adds depends on what's actually happening in each iterable, and how many CPU cores you have/how busy they are.
The classic example is the Python GIL: if each step is fairly CPU heavy, and just uses Python, then not much parallelism would be added, and this might not be faster than the serial version. On the other hand, if each is network IO heavy, then I think it's likely to be faster.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复