How can I recover the return value of a function passed to multiprocessing.Process?

前端 未结 12 1608
野的像风
野的像风 2020-11-22 07:36

In the example code below, I\'d like to recover the return value of the function worker. How can I go about doing this? Where is this value stored?

相关标签:
12条回答
  • 2020-11-22 07:54

    For some reason, I couldn't find a general example of how to do this with Queue anywhere (even Python's doc examples don't spawn multiple processes), so here's what I got working after like 10 tries:

    def add_helper(queue, arg1, arg2): # the func called in child processes
        ret = arg1 + arg2
        queue.put(ret)
    
    def multi_add(): # spawns child processes
        q = Queue()
        processes = []
        rets = []
        for _ in range(0, 100):
            p = Process(target=add_helper, args=(q, 1, 2))
            processes.append(p)
            p.start()
        for p in processes:
            ret = q.get() # will block
            rets.append(ret)
        for p in processes:
            p.join()
        return rets
    

    Queue is a blocking, thread-safe queue that you can use to store the return values from the child processes. So you have to pass the queue to each process. Something less obvious here is that you have to get() from the queue before you join the Processes or else the queue fills up and blocks everything.

    Update for those who are object-oriented (tested in Python 3.4):

    from multiprocessing import Process, Queue
    
    class Multiprocessor():
    
        def __init__(self):
            self.processes = []
            self.queue = Queue()
    
        @staticmethod
        def _wrapper(func, queue, args, kwargs):
            ret = func(*args, **kwargs)
            queue.put(ret)
    
        def run(self, func, *args, **kwargs):
            args2 = [func, self.queue, args, kwargs]
            p = Process(target=self._wrapper, args=args2)
            self.processes.append(p)
            p.start()
    
        def wait(self):
            rets = []
            for p in self.processes:
                ret = self.queue.get()
                rets.append(ret)
            for p in self.processes:
                p.join()
            return rets
    
    # tester
    if __name__ == "__main__":
        mp = Multiprocessor()
        num_proc = 64
        for _ in range(num_proc): # queue up multiple tasks running `sum`
            mp.run(sum, [1, 2, 3, 4, 5])
        ret = mp.wait() # get all results
        print(ret)
        assert len(ret) == num_proc and all(r == 15 for r in ret)
    
    0 讨论(0)
  • 2020-11-22 07:56

    I think the approach suggested by @sega_sai is the better one. But it really needs a code example, so here goes:

    import multiprocessing
    from os import getpid
    
    def worker(procnum):
        print('I am number %d in process %d' % (procnum, getpid()))
        return getpid()
    
    if __name__ == '__main__':
        pool = multiprocessing.Pool(processes = 3)
        print(pool.map(worker, range(5)))
    

    Which will print the return values:

    I am number 0 in process 19139
    I am number 1 in process 19138
    I am number 2 in process 19140
    I am number 3 in process 19139
    I am number 4 in process 19140
    [19139, 19138, 19140, 19139, 19140]
    

    If you are familiar with map (the Python 2 built-in) this should not be too challenging. Otherwise have a look at sega_Sai's link.

    Note how little code is needed. (Also note how processes are re-used).

    0 讨论(0)
  • 2020-11-22 07:57

    The pebble package has a nice abstraction leveraging multiprocessing.Pipe which makes this quite straightforward:

    from pebble import concurrent
    
    @concurrent.process
    def function(arg, kwarg=0):
        return arg + kwarg
    
    future = function(1, kwarg=1)
    
    print(future.result())
    

    Example from: https://pythonhosted.org/Pebble/#concurrent-decorators

    0 讨论(0)
  • 2020-11-22 08:00

    For anyone else who is seeking how to get a value from a Process using Queue:

    import multiprocessing
    
    ret = {'foo': False}
    
    def worker(queue):
        ret = queue.get()
        ret['foo'] = True
        queue.put(ret)
    
    if __name__ == '__main__':
        queue = multiprocessing.Queue()
        queue.put(ret)
        p = multiprocessing.Process(target=worker, args=(queue,))
        p.start()
        p.join()
        print(queue.get())  # Prints {"foo": True}
    

    Note that in Windows or Jupyter Notebook, with multithreading you have to save this as a file and execute the file. If you do it in a command prompt you will see an error like this:

     AttributeError: Can't get attribute 'worker' on <module '__main__' (built-in)>
    
    0 讨论(0)
  • 2020-11-22 08:05

    It seems that you should use the multiprocessing.Pool class instead and use the methods .apply() .apply_async(), map()

    http://docs.python.org/library/multiprocessing.html?highlight=pool#multiprocessing.pool.AsyncResult

    0 讨论(0)
  • 2020-11-22 08:07

    Thought I'd simplify the simplest examples copied from above, working for me on Py3.6. Simplest is multiprocessing.Pool:

    import multiprocessing
    import time
    
    def worker(x):
        time.sleep(1)
        return x
    
    pool = multiprocessing.Pool()
    print(pool.map(worker, range(10)))
    

    You can set the number of processes in the pool with, e.g., Pool(processes=5). However it defaults to CPU count, so leave it blank for CPU-bound tasks. (I/O-bound tasks often suit threads anyway, as the threads are mostly waiting so can share a CPU core.) Pool also applies chunking optimization.

    (Note that the worker method cannot be nested within a method. I initially defined my worker method inside the method that makes the call to pool.map, to keep it all self-contained, but then the processes couldn't import it, and threw "AttributeError: Can't pickle local object outer_method..inner_method". More here. It can be inside a class.)

    (Appreciate the original question specified printing 'represent!' rather than time.sleep(), but without it I thought some code was running concurrently when it wasn't.)


    Py3's ProcessPoolExecutor is also two lines (.map returns a generator so you need the list()):

    from concurrent.futures import ProcessPoolExecutor
    with ProcessPoolExecutor() as executor:
        print(list(executor.map(worker, range(10))))
    

    With plain Processes:

    import multiprocessing
    import time
    
    def worker(x, queue):
        time.sleep(1)
        queue.put(x)
    
    queue = multiprocessing.SimpleQueue()
    tasks = range(10)
    
    for task in tasks:
        multiprocessing.Process(target=worker, args=(task, queue,)).start()
    
    for _ in tasks:
        print(queue.get())
    

    Use SimpleQueue if all you need is put and get. The first loop starts all the processes, before the second makes the blocking queue.get calls. I don't think there's any reason to call p.join() too.

    0 讨论(0)
提交回复
热议问题