问题
When calling a linux binary which takes a relatively long time through Python's subprocess
module, does this release the GIL?
I want to parallelise some code which calls a binary program from the command line. Is it better to use threads (through threading
and a multiprocessing.pool.ThreadPool
) or multiprocessing
? My assumption is that if subprocess
releases the GIL then choosing the threading
option is better.
回答1:
When calling a linux binary which takes a relatively long time through Python's
subprocess
module, does this release the GIL?
Yes, it releases the Global Interpreter Lock (GIL) in the calling process.
As you are likely aware, on POSIX platforms subprocess
offers convenience interfaces atop the "raw" components from fork
, execve
, and waitpid
.
By inspection of the CPython 2.7.9 sources, fork
and execve
do not release the GIL. However, those calls do not block, so we'd not expect the GIL to be released.
waitpid
of course does block, but we see it's implementation does give up the GIL using the ALLOW_THREADS macros:
static PyObject *
posix_waitpid(PyObject *self, PyObject *args)
{
....
Py_BEGIN_ALLOW_THREADS
pid = waitpid(pid, &status, options);
Py_END_ALLOW_THREADS
....
This could also be tested by calling out to some long running program like sleep from a demonstration multithreaded python script.
回答2:
GIL doesn't span multiple processes. subprocess.Popen
starts a new process. If it starts a Python process then it will have its own GIL.
You don't need multiple threads (or processes created by multiprocessing
) if all you want is to run some linux binaries in parallel:
from subprocess import Popen
# start all processes
processes = [Popen(['program', str(i)]) for i in range(10)]
# now all processes run in parallel
# wait for processes to complete
for p in processes:
p.wait()
You could use multiprocessing.ThreadPool to limit number of concurrently run programs.
回答3:
Since subprocess
is for running executable (it is essentially a wrapper around os.fork()
and os.execve()
), it probably makes more sense to use it. You can use subprocess.Popen. Something like:
import subprocess
process = subprocess.Popen(["binary"])
This will run in as a separate process, hence not being affected by the GIL. You can then use the Popen.poll() method to check if child process has terminated:
if process.poll():
# process has finished its work
returncode = process.returncode
Just need to make sure you don't call any of the methods that wait for the process to finish its work (e.g. Popen.communicate()) to avoid your Python script blocking.
As mentioned in this answer
multiprocessing
is for running functions within your existing (Python) code with support for more flexible communications among the family of processes.multiprocessing
module is intended to provide interfaces and features which are very similar to threading while allowing CPython to scale your processing among multiple CPUs/cores despite the GIL.
So, given your use-case, subprocess
seems to be the right choice.
来源:https://stackoverflow.com/questions/23369064/does-using-the-subprocess-module-release-the-python-gil