Why does communicate deadlock when used with multiple Popen subprocesses?

问题

The following issue does not occur in Python 2.7.3. However, it occurs with both Python 2.7.1 and Python 2.6 on my machine (64-bit Mac OSX 10.7.3). This is code I will eventually distribute, so I would like to know if there is any way to complete this task that does not depend so dramatically on the Python version.

I need to open multiple subprocesses in parallel and write STDIN data to each of them. Normally I would do this using the Popen.communicate method. However, communicate is deadlocking whenever I have multiple processes open at the same time.

import subprocess

cmd = ["grep", "hello"]
processes = [subprocess.Popen(cmd, stdin=subprocess.PIPE,
                              stdout=subprocess.PIPE, stderr=subprocess.PIPE)
                                for _ in range(2)]

for p in processes:
    print p.communicate("hello world\ngoodbye world\n")

If I change the number of processes to for _ in range(1), the output is just as expected:

('hello world\n', '')

However, when there are two processes (for _ in range(2)), the process blocks indefinitely. I've tried the alternative of writing to stdin manually:

for p in processes:
    p.stdin.write("hello world\ngoodbye world\n")

But then any attempt to read from the processes, (p.stdout.read(), for example) still deadlocks.

At first this appears to be related, but it specifies that it occurs when multiple threads are being used, and that the deadlocking occurs only very infrequently (while here it always occurs). Is there any way to get this to work on versions of Python before 2.7.3?

回答1:

I had to dig a bit for this one. (I ran into a similar problem once, so thought I knew the answer, but was wrong.)

The issue (and patch for 2.7.3) is described here:

http://bugs.python.org/issue12786

The issue is that the PIPEs get inherited by subprocesses. The answer is to use 'close_fds=True' in your Popen call.

processes = [subprocess.Popen(cmd, stdin=subprocess.PIPE,
               stdout=subprocess.PIPE, stderr=subprocess.PIPE,close_fds=True)
                            for _ in range(2)]

If that causes issues with other file descriptors you want to re-use (if this was a simplified example), it turns out that you can wait()/communicate() with the subprocesses in the reverse order they were created, and it seems to work.

ie, instead of:

for p in processes:
    print p.communicate("hello world\ngoodbye world\n")

use:

while processes:
    print processes.pop().communicate("hello world\ngoodbye world\n")

(Or, I guess, just do 'processes.reverse()' before your existing loop.)

来源：https://stackoverflow.com/questions/14615462/why-does-communicate-deadlock-when-used-with-multiple-popen-subprocesses

标签

python

python-2.7

multiprocessing

subprocess