Better multithreaded use of Python subprocess.Popen & communicate()?

好久不见. 提交于 2020-01-01 17:30:19

问题


I'm running multiple commands which may take some time, in parallel, on a Linux machine running Python 2.6.

So, I used subprocess.Popen class and process.communicate() method to parallelize execution of mulitple command groups and capture the output at once after execution.

def run_commands(commands, print_lock):
    # this part runs in parallel.
    outputs = []
    for command in commands:
        proc = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True)
        output, unused_err = proc.communicate()  # buffers the output
        retcode = proc.poll()                    # ensures subprocess termination
        outputs.append(output)
    with print_lock: # print them at once (synchronized)
        for output in outputs:
            for line in output.splitlines():
                print(line)

At somewhere else it's called like this:

processes = []
print_lock = Lock()
for ...:
    commands = ...  # a group of commands is generated, which takes some time.
    processes.append(Thread(target=run_commands, args=(commands, print_lock)))
    processes[-1].start()
for p in processes: p.join()
print('done.')

The expected result is that each output of a group of commands is displayed at once while execution of them is done in parallel.

But from the second output group (of course, the thread that become the second is changed due to scheduling indeterminism), it begins to print without newlines and adding spaces as many as the number of characters printed in each previous line and input echo is turned off -- the terminal state is "garbled" or "crashed". (If I issue reset shell command, it restores normal.)

At first, I tried to find the reason from handling of '\r', but it was not the reason. As you see in my code, I handled it properly using splitlines(), and I confirmed that with repr() function applied to the output.

I think the reason is concurrent use of pipes in Popen and communicate() for stdout/stderr. I tried check_output shortcut method in Python 2.7, but no success. Of course, the problem described above does not occur if I serialize all command executions and prints.

Is there any better way to handle Popen and communicate() in parallel?


回答1:


A final result inspired by the comment from J.F.Sebastian.

http://bitbucket.org/daybreaker/kaist-cs443/src/247f9ecf3cee/tools/manage.py

It seems to be a Python bug.




回答2:


I am not sure it is clear what run_commands needs to be actually doing, but it seems to be simply doing a poll on a subprocess, ignoring the return-code and continuing in the loop. When you get to the part where you are printing output, how could you know the sub-processes have completed?




回答3:


In your example code I noticed your use of:

for line in output.splitlines(): 

to address partially the issue of " /r " ; use of

for line in output.splitlines(True): 

would have been helpful.



来源:https://stackoverflow.com/questions/4287178/better-multithreaded-use-of-python-subprocess-popen-communicate

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!