问题
I have a very weird problem while creating a Python extension with Cython that uses joblib.Parallel
.
The following code works as expected:
from joblib import Parallel, delayed
from math import sqrt
print(Parallel(n_jobs=4)(delayed(sqrt)(x) for x in range(4)))
The following code hangs forever:
from joblib import Parallel, delayed
def mult(x):
return x*3
print(Parallel(n_jobs=4)(delayed(mult)(x) for x in range(4)))
I have no clues why. I use the following setup.py
:
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules = cythonize("file.pyx")
)
I create the extension with python setup.py build_ext --inplace
and I import it as import file
.
Thank you!
回答1:
After some time I finally found the solution: there is a deadlock while pickling the program status to send it to different CPUs. I am not totally sure of the cause, but inspecting the source code, it looked as if new threads are generated to pickle the objects and these threads are the ones to cause the deadlock.
Once the processes are generated they run normally: manually creating the processes through the library multiprocessing
fixes the problem.
Alternatively, you can use multiprocessing.Pool
manually specifying the start_method
:
from multiprocessing import get_context()
if __name__ == '__main__':
with get_context("spawn").Pool() as pool:
...
You can freely choose spawn
or forkserver
as start_method
.
Visit this page if you want more information.
来源:https://stackoverflow.com/questions/53497078/joblib-parallel-cython-hanging-forever