Multiprocessing in a pipeline done right

后端 未结 6 1203
长情又很酷
长情又很酷 2021-02-04 05:06

I\'d like to know how multiprocessing is done right. Assuming I have a list [1,2,3,4,5] generated by function f1 which is written to a Queue

相关标签:
6条回答
  • 2021-02-04 05:48

    I use concurent.futures and three pools, which are connected together via future.add_done_callback. Then I wait for the whole process to end by calling shutdown on each pool.

    from concurrent.futures import ProcessPoolExecutor
    import time
    import random
    
    
    def worker1(arg):
        time.sleep(random.random())
        return arg
    
    
    def pipe12(future):
        pool2.submit(worker2, future.result()).add_done_callback(pipe23)
    
    
    def worker2(arg):
        time.sleep(random.random())
        return arg
    
    
    def pipe23(future):
        pool3.submit(worker3, future.result()).add_done_callback(spout)
    
    
    def worker3(arg):
        time.sleep(random.random())
        return arg
    
    
    def spout(future):
        print(future.result())
    
    
    if __name__ == "__main__":
        __spec__ = None  # Fix multiprocessing in Spyder's IPython
        pool1 = ProcessPoolExecutor(2)
        pool2 = ProcessPoolExecutor(2)
        pool3 = ProcessPoolExecutor(2)
        for i in range(10):
            pool1.submit(worker1, i).add_done_callback(pipe12)
        pool1.shutdown()
        pool2.shutdown()
        pool3.shutdown()
    
    0 讨论(0)
  • 2021-02-04 05:51

    What would be wrong with using idea 1, but with each worker process (f2) putting a custom object with its identifier when it is done? Then f3, would just terminate that worker, until there was no worker process left.

    Also, new in Python 3.2 is the concurrent.futures package on the standard library, that should do what you are trying to in the "right way" (tm) - http://docs.python.org/dev/library/concurrent.futures.html

    Maybe it is possible to find a backport of concurrent.futures to Python 2.x series.

    0 讨论(0)
  • 2021-02-04 05:52

    For Idea 1, how about:

    import multiprocessing as mp
    
    sentinel=None
    
    def f2(inq,outq):
        while True:
            val=inq.get()
            if val is sentinel:
                break
            outq.put(val*2)
    
    def f3(outq):
        while True:
            val=outq.get()
            if val is sentinel:
                break
            print(val)
    
    def f1():
        num_workers=2
        inq=mp.Queue()
        outq=mp.Queue()
        for i in range(5):
            inq.put(i)
        for i in range(num_workers):        
            inq.put(sentinel)
        workers=[mp.Process(target=f2,args=(inq,outq)) for i in range(2)]
        printer=mp.Process(target=f3,args=(outq,))
        for w in workers:
            w.start()
        printer.start()
        for w in workers:
            w.join()
        outq.put(sentinel)
        printer.join()
    
    if __name__=='__main__':
        f1()
    

    The only difference from the description of Idea 1 is that f2 breaks out of the while-loop when it receives the sentinel (thus terminating itself). f1 blocks until the workers are done (using w.join()) and then sends f3 the sentinel (signaling that it break out of its while-loop).

    0 讨论(0)
  • 2021-02-04 05:58

    The easiest way to do exactly that is using semaphores.

    F1

    F1 is populating your 'Queue' with the data you want to process. End the end of this push, you put n 'Stop' keywords in your queue. n = 2 for your example, but usually the number of involved workers. Code would look like:

    for n in no_of_processes:
        tasks.put('Stop')
    

    F2

    F2 is pulling from the provided queue by a get-command. The element is taken from the queue and deleted in the queue. Now, you can put the pop into a loop while paying attention to the stop signal:

    for elem in iter(tasks.get, 'STOP'):
       do something
    

    F3

    This one is a bit tricky. You could generate a semaphore in F2 that acts as a signal to F3. But you do not know when this signal arrives and you may loose data. However, F3 pulls the data the same way as F2 and you could put that into a try... except-statement. queue.get raises an queue.Emptywhen there are no elements in the queue. So your pull in F3 would look like:

    while control:
        try:
            results.get()
        except queue.Empty:
            control = False
    

    With tasks and results being queues. So you do not need anything which is not already included in Python.

    0 讨论(0)
  • 2021-02-04 06:06

    With MPipe module, simply do this:

    from mpipe import OrderedStage, Pipeline
    
    def f1(value):
        return value * 2
    
    def f2(value):
        print(value)
    
    s1 = OrderedStage(f1, size=2)
    s2 = OrderedStage(f2)
    p = Pipeline(s1.link(s2))
    
    for task in 1, 2, 3, 4, 5, None:
        p.put(task)
    

    The above runs 4 processes:

    • two for the first stage (function f1)
    • one for the second stage (function f2)
    • and one more for the main program that feeds the pipeline.

    The MPipe cookbook offers some explanation of how processes are shut down internally using None as the last task.

    To run the code, install MPipe:

    virtualenv venv
    venv/bin/pip install mpipe
    venv/bin/python prog.py
    

    Output:

    2
    4
    6
    8
    10
    
    0 讨论(0)
  • 2021-02-04 06:08

    Pypeline does this for you. You can even choose between using Processes, Threads or async Tasks. What you want is just e.g. using Processes:

    import pypeln as pl
    
    data = some_iterable()
    data = pl.process.map(f2, data, workers = 3)
    data = list(data)
    

    You can do more complex stuff

    import pypeln as pl
    
    data = some_iterable()
    data = pl.process.map(f2, data, workers = 3)
    data = pl.process.filter(f3, data, workers = 1)
    data = pl.process.flat_map(f4, data, workers = 5)
    data = list(data)
    
    0 讨论(0)
提交回复
热议问题