In Python, is there an async equivalent to multiprocessing or concurrent.futures?

前端 未结 2 1400
花落未央
花落未央 2021-01-31 08:53

Basically, I\'m looking for something that offers a parallel map using python3 coroutines as the backend instead of threads or processes. I believe there should be less overhead

2条回答
  •  孤街浪徒
    2021-01-31 09:29

    DISCLAIMER PEP 0492 defines only syntax and usage for coroutines. They require an event loop to run, which is most likely asyncio's event loop.

    Asynchronous map

    I don't know any implementation of map based on coroutines. However it's trivial to implement basic map functionality using asyncio.gather():

    def async_map(coroutine_func, iterable):
        loop = asyncio.get_event_loop()
        future = asyncio.gather(*(coroutine_func(param) for param in iterable))
        return loop.run_until_complete(future)
    

    This implementation is really simple. It creates a coroutine for each item in the iterable, joins them into single coroutine and executes joined coroutine on event loop.

    Provided implementation covers part of the cases. However it has a problem. With long iterable you would probably want to limit amount of coroutines running in parallel. I can't come up with simple implementation, which is efficient and preserves order at the same time, so I will leave it as an exercise for a reader.

    Performance

    You claimed:

    I believe there should be less overhead when performing highly parallel IO work.

    It requires proof, so here is a comparison of multiprocessing implementation, gevent implementation by a p and my implementation based on coroutines. All tests were performed on Python 3.5.

    Implementation using multiprocessing:

    from multiprocessing import Pool
    import time
    
    
    def async_map(f, iterable):
        with Pool(len(iterable)) as p:  # run one process per item to measure overhead only
            return p.map(f, iterable)
    
    def func(val):
        time.sleep(1)
        return val * val
    

    Implementation using gevent:

    import gevent
    from gevent.pool import Group
    
    
    def async_map(f, iterable):
        group = Group()
        return group.map(f, iterable)
    
    def func(val):
        gevent.sleep(1)
        return val * val
    

    Implementation using asyncio:

    import asyncio
    
    
    def async_map(f, iterable):
        loop = asyncio.get_event_loop()
        future = asyncio.gather(*(f(param) for param in iterable))
        return loop.run_until_complete(future)
    
    async def func(val):
        await asyncio.sleep(1)
        return val * val
    

    Testing program is usual timeit:

    $ python3 -m timeit -s 'from perf.map_mp import async_map, func' -n 1 'async_map(func, list(range(10)))'
    

    Results:

    1. Iterable of 10 items:

      • multiprocessing - 1.05 sec
      • gevent - 1 sec
      • asyncio - 1 sec
    2. Iterable of 100 items:

      • multiprocessing - 1.16 sec
      • gevent - 1.01 sec
      • asyncio - 1.01 sec
    3. Iterable of 500 items:

      • multiprocessing - 2.31 sec
      • gevent - 1.02 sec
      • asyncio - 1.03 sec
    4. Iterable of 5000 items:

      • multiprocessing - failed (spawning 5k processes is not so good idea!)
      • gevent - 1.12 sec
      • asyncio - 1.22 sec
    5. Iterable of 50000 items:

      • gevent - 2.2 sec
      • asyncio - 3.25 sec

    Conclusions

    Concurrency based on event loop works faster, when program do mostly I/O, not computations. Keep in mind, that difference will be smaller, when there are less I/O and more computations are involved.

    Overhead introduced by spawning processes is significantly bigger, than overhead introduced by event loop based concurrency. It means that your assumption is correct.

    Comparing asyncio and gevent we can say, that asyncio has 33-45% bigger overhead. It means that creation of greenlets is cheaper, than creation of coroutines.

    As a final conclusion: gevent has better performance, but asyncio is part of the standard library. Difference in performance (absolute numbers) isn't very significant. gevent is quite mature library, while asyncio is relatively new, but it advances quickly.

提交回复
热议问题