问题
Being new to using concurrency, I am confused about when to use the different python concurrency libraries. To my understanding, multiprocessing, multithreading and asynchronous programming are part of concurrency, while multiprocessing is part of a subset of concurrency called parallelism.
I searched around on the web about different ways to approach concurrency in python, and I came across the multiprocessing library, concurrenct.futures' ProcessPoolExecutor() and ThreadPoolExecutor(), and asyncio. What confuses me is the difference between these libraries. Especially what the multiprocessing library does, since it has methods like pool.apply_async, does it also do the job of asyncio? If so, why is it called multiprocessing when it is a different method to achieve concurrency from asyncio (multiple processes vs cooperative multitasking)?
回答1:
There are several different libraries at play:
threading
: interface to OS-level threads. Note that CPU-bound work is mostly serialized by the GIL, so don't expect speedup in your calculations. Use it when you need to invoke blocking APIs in parallel, in particular when you need control over thread creation. Avoid creating too many threads, as they are expensive.multiprocessing
: interface to spawning multiple python processes with an API intentionally similar tothreading
. Multiple processes work in parallel, so you can actually speed up calculations using this method. The disadvantage is that you can't share in-memory datastructures without using multi-processing specific tools.concurrent.futures
: A more modern interface tothreading
andmultiprocessing
, which provides convenient thread/process pools it calls executors. The pool's main entry point is the submit method which returns a handle that you can test for completion or wait for its result. Getting the result gives you the return value of the submitted function and correctly propagates raised exceptions (if any), which would be tedious to do withthreading
. This should be the first tool of choice when considering thread or process based parallelism.asyncio
: While the previous options are "async" in the sense that they provide non-blocking APIs (this is what apply_async and others refer to), they are still relying on thread/process pools to do their magic, and cannot really do more things in parallel than they have workers in the pool. Asyncio uses a single thread of execution and async system calls across the board. It has no blocking calls at all, the only blocking part being the asyncio.run() entry point. Asyncio code is typically written using coroutines, which useawait
to suspend until something interesting happens. (Suspending is different than blocking in that it allows the event loop thread to continue to other things while you're waiting.) It has many advantages compared to thread-based solutions, such as being able to spawn thousands of cheap "tasks" without bogging down the system, and being able to cancel tasks or easily wait for multiple things at once. Asyncio should be the tool of choice for servers and for clients connecting to multiple servers.
When choosing between asyncio and multithreading/multiprocessing, consider the adage that "threading is for working in parallel, and async is for waiting in parallel".
Also note that asyncio can await functions executed in thread or process pools provided by concurrent.futures
, so it can serve as glue between all those different models. This is part of the reason why asyncio is often used to build new library infrastructure.
来源:https://stackoverflow.com/questions/61351844/difference-between-multiprocessing-asyncio-and-concurrency-futures-in-python