Multiprocessing in a pipeline done right

后端未结

关注

 6  1203

I\'d like to know how multiprocessing is done right. Assuming I have a list [1,2,3,4,5] generated by function f1 which is written to a Queue


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  [愿得一人]        
                
              
                            
                2021-02-04 05:48
              
            
            
                                                                       
I use concurent.futures and three pools, which are connected together via future.add_done_callback. Then I wait for the whole process to end by calling shutdown on each pool.

from concurrent.futures import ProcessPoolExecutor
import time
import random


def worker1(arg):
    time.sleep(random.random())
    return arg


def pipe12(future):
    pool2.submit(worker2, future.result()).add_done_callback(pipe23)


def worker2(arg):
    time.sleep(random.random())
    return arg


def pipe23(future):
    pool3.submit(worker3, future.result()).add_done_callback(spout)


def worker3(arg):
    time.sleep(random.random())
    return arg


def spout(future):
    print(future.result())


if __name__ == "__main__":
    __spec__ = None  # Fix multiprocessing in Spyder's IPython
    pool1 = ProcessPoolExecutor(2)
    pool2 = ProcessPoolExecutor(2)
    pool3 = ProcessPoolExecutor(2)
    for i in range(10):
        pool1.submit(worker1, i).add_done_callback(pipe12)
    pool1.shutdown()
    pool2.shutdown()
    pool3.shutdown()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  無奈伤痛        
                
              
                            
                2021-02-04 05:51
              
            
            
                                                                       
What would be wrong with using idea 1, but with each worker process (f2) putting a custom object with its identifier when it is done? Then f3, would just terminate that worker, until there was no worker process left.

Also, new in Python 3.2 is the concurrent.futures package on the standard library, that should do what you are trying to in the "right way" (tm) -
http://docs.python.org/dev/library/concurrent.futures.html 

Maybe it is possible to find a backport of concurrent.futures to Python 2.x series.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘了有多久        
                
              
                            
                2021-02-04 05:52
              
            
            
                                                                       
For Idea 1, how about:

import multiprocessing as mp

sentinel=None

def f2(inq,outq):
    while True:
        val=inq.get()
        if val is sentinel:
            break
        outq.put(val*2)

def f3(outq):
    while True:
        val=outq.get()
        if val is sentinel:
            break
        print(val)

def f1():
    num_workers=2
    inq=mp.Queue()
    outq=mp.Queue()
    for i in range(5):
        inq.put(i)
    for i in range(num_workers):        
        inq.put(sentinel)
    workers=[mp.Process(target=f2,args=(inq,outq)) for i in range(2)]
    printer=mp.Process(target=f3,args=(outq,))
    for w in workers:
        w.start()
    printer.start()
    for w in workers:
        w.join()
    outq.put(sentinel)
    printer.join()

if __name__=='__main__':
    f1()


The only difference from the description of Idea 1 is that f2 breaks out of the while-loop when it receives the sentinel (thus terminating itself). f1 blocks until the workers are done (using w.join()) and then sends f3 the sentinel (signaling that it break out of its while-loop).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2021-02-04 05:58
              
            
            
                                                                       
The easiest way to do exactly that is using semaphores.

F1

F1 is populating your 'Queue' with the data you want to process. End the end of this push, you put n 'Stop' keywords in your queue. n = 2 for your example, but usually the number of involved workers.
Code would look like:

for n in no_of_processes:
    tasks.put('Stop')


F2

F2 is pulling from the provided queue by a get-command. The element is taken from the queue and deleted in the queue. Now, you can put the pop into a loop while paying attention to the stop signal:

for elem in iter(tasks.get, 'STOP'):
   do something


F3

This one is a bit tricky. You could generate a semaphore in F2 that acts as a signal to F3. But you do not know when this signal arrives and you may loose data. However, F3 pulls the data the same way as F2 and you could put that into a try... except-statement.
queue.get raises an queue.Emptywhen there are no elements in the queue. So your pull in F3 would look like:

while control:
    try:
        results.get()
    except queue.Empty:
        control = False


With tasks and results being queues. So you do not need anything which is not already included in Python.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  孤街浪徒        
                
              
                            
                2021-02-04 06:06
              
            
            
                                                                       
With MPipe module, simply do this:

from mpipe import OrderedStage, Pipeline

def f1(value):
    return value * 2

def f2(value):
    print(value)

s1 = OrderedStage(f1, size=2)
s2 = OrderedStage(f2)
p = Pipeline(s1.link(s2))

for task in 1, 2, 3, 4, 5, None:
    p.put(task)


The above runs 4 processes: 


two for the first stage (function f1)
one for the second stage (function f2)
and one more for the main program that feeds the pipeline.


The MPipe cookbook offers some explanation of how processes are shut down internally using None as the last task.

To run the code, install MPipe:

virtualenv venv
venv/bin/pip install mpipe
venv/bin/python prog.py


Output:

2
4
6
8
10

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  天涯浪人        
                
              
                            
                2021-02-04 06:08
              
            
            
                                                                       
Pypeline does this for you. You can even choose between using Processes, Threads or async Tasks. What you want is just e.g. using Processes:
import pypeln as pl

data = some_iterable()
data = pl.process.map(f2, data, workers = 3)
data = list(data)

You can do more complex stuff
import pypeln as pl

data = some_iterable()
data = pl.process.map(f2, data, workers = 3)
data = pl.process.filter(f3, data, workers = 1)
data = pl.process.flat_map(f4, data, workers = 5)
data = list(data)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复