Python multiprocessing: is it possible to have a pool inside of a pool?

前端 未结 1 671
终归单人心
终归单人心 2021-01-31 05:44

I have a module A that does a basic map/reduce by taking data and sending it to modules B, C, D, etc for analysis and then joining their results together.

But it appears

1条回答
  •  心在旅途
    2021-01-31 05:52

    is it possible to have a pool inside of a pool?

    Yes, it is possible though it might not be a good idea unless you want to raise an army of zombies. From Python Process Pool non-daemonic?:

    import multiprocessing.pool
    from contextlib import closing
    from functools import partial
    
    class NoDaemonProcess(multiprocessing.Process):
        # make 'daemon' attribute always return False
        def _get_daemon(self):
            return False
        def _set_daemon(self, value):
            pass
        daemon = property(_get_daemon, _set_daemon)
    
    # We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
    # because the latter is only a wrapper function, not a proper class.
    class Pool(multiprocessing.pool.Pool):
        Process = NoDaemonProcess
    
    def foo(x, depth=0):
        if depth == 0:
            return x
        else:
            with closing(Pool()) as p:
                return p.map(partial(foo, depth=depth-1), range(x + 1))
    
    if __name__ == "__main__":
        from pprint import pprint
        pprint(foo(10, depth=2))
    

    Output

    [[0],
     [0, 1],
     [0, 1, 2],
     [0, 1, 2, 3],
     [0, 1, 2, 3, 4],
     [0, 1, 2, 3, 4, 5],
     [0, 1, 2, 3, 4, 5, 6],
     [0, 1, 2, 3, 4, 5, 6, 7],
     [0, 1, 2, 3, 4, 5, 6, 7, 8],
     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]
    

    concurrent.futures supports it by default:

    # $ pip install futures # on Python 2
    from concurrent.futures import ProcessPoolExecutor as Pool
    from functools import partial
    
    def foo(x, depth=0):
        if depth == 0:
            return x
        else:
            with Pool() as p:
                return list(p.map(partial(foo, depth=depth-1), range(x + 1)))
    
    if __name__ == "__main__":
        from pprint import pprint
        pprint(foo(10, depth=2))
    

    It produces the same output.

    Is it possible to parallelize these jobs some other way?

    Yes. For example, look at how celery allows to create a complex workflow.

    0 讨论(0)
提交回复
热议问题