Does using the subprocess module release the python GIL?

…衆ロ難τιáo~ 提交于 2020-01-02 01:09:30

问题


When calling a linux binary which takes a relatively long time through Python's subprocess module, does this release the GIL?

I want to parallelise some code which calls a binary program from the command line. Is it better to use threads (through threading and a multiprocessing.pool.ThreadPool) or multiprocessing? My assumption is that if subprocess releases the GIL then choosing the threading option is better.


回答1:


When calling a linux binary which takes a relatively long time through Python's subprocess module, does this release the GIL?

Yes, it releases the Global Interpreter Lock (GIL) in the calling process.

As you are likely aware, on POSIX platforms subprocess offers convenience interfaces atop the "raw" components from fork, execve, and waitpid.

By inspection of the CPython 2.7.9 sources, fork and execve do not release the GIL. However, those calls do not block, so we'd not expect the GIL to be released.

waitpid of course does block, but we see it's implementation does give up the GIL using the ALLOW_THREADS macros:

static PyObject *
posix_waitpid(PyObject *self, PyObject *args)
{
....
Py_BEGIN_ALLOW_THREADS
pid = waitpid(pid, &status, options);
Py_END_ALLOW_THREADS
....

This could also be tested by calling out to some long running program like sleep from a demonstration multithreaded python script.




回答2:


GIL doesn't span multiple processes. subprocess.Popen starts a new process. If it starts a Python process then it will have its own GIL.

You don't need multiple threads (or processes created by multiprocessing) if all you want is to run some linux binaries in parallel:

from subprocess import Popen

# start all processes
processes = [Popen(['program', str(i)]) for i in range(10)]
# now all processes run in parallel

# wait for processes to complete
for p in processes:
    p.wait()

You could use multiprocessing.ThreadPool to limit number of concurrently run programs.




回答3:


Since subprocess is for running executable (it is essentially a wrapper around os.fork() and os.execve()), it probably makes more sense to use it. You can use subprocess.Popen. Something like:

 import subprocess

 process = subprocess.Popen(["binary"])

This will run in as a separate process, hence not being affected by the GIL. You can then use the Popen.poll() method to check if child process has terminated:

if process.poll():
    # process has finished its work
    returncode = process.returncode

Just need to make sure you don't call any of the methods that wait for the process to finish its work (e.g. Popen.communicate()) to avoid your Python script blocking.

As mentioned in this answer

multiprocessing is for running functions within your existing (Python) code with support for more flexible communications among the family of processes. multiprocessing module is intended to provide interfaces and features which are very similar to threading while allowing CPython to scale your processing among multiple CPUs/cores despite the GIL.

So, given your use-case, subprocess seems to be the right choice.



来源:https://stackoverflow.com/questions/23369064/does-using-the-subprocess-module-release-the-python-gil

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!