I'm trying to decide if I should use multiprocessing or threading, and I've learned some interesting bits about the Global Interpreter Lock. In this nice blog post, it seems multithreading isn't suitable for busy tasks. However, I also learned that some functionality, such as I/O or numpy, is unaffected by the GIL.
Can anyone explain why, and how I can find out if my (probably quite numpy-heavy) code is going to be suitable for multithreading?
Many numpy calculations are unaffected by the GIL, but not all.
While in code that does not require the Python interpreter (e.g. C libraries) it is possible to specifically release the GIL - allowing other code that depends on the interpreter to continue running. In the Numpy C codebase the macros NPY_BEGIN_THREADS
and NPY_END_THREADS
are used to delimit blocks of code that permit GIL release. You can see these in this search of the numpy source.
The NumPy C API documentation has more information on threading support. Note the additional macros NPY_BEGIN_THREADS_DESCR
, NPY_END_THREADS_DESCR
and NPY_BEGIN_THREADS_THRESHOLDED
which handle conditional GIL release, dependent on array dtypes
and the size of loops.
Most core functions release the GIL - for example Universal Functions (ufunc) do so as described:
as long as no object arrays are involved, the Python Global Interpreter Lock (GIL) is released prior to calling the loops. It is re-acquired if necessary to handle error conditions.
With regard to your own code, the source code for NumPy is available. Check the functions you use (and the functions they call) for the above macros. Note also that the performance benefit is heavily dependent on how long the GIL is released - if your code is constantly dropping in/out of Python you won't see much of an improvement.
The other option is to just test it. However, bear in mind that functions using the conditional GIL macros may exhibit different behaviour with small and large arrays. A test with a small dataset may therefore not be an accurate representation of performance for a larger task.
There is some additional information on parallel processing with numpy available on the official wiki and a useful post about the Python GIL in general over on Programmers.SE.
来源:https://stackoverflow.com/questions/36479159/why-are-numpy-calculations-not-affected-by-the-global-interpreter-lock