Using subprocess.Popen for Process with Large Output

后端未结

关注

 7  1933

I have some Python code that executes an external app which works fine when the app has a small amount of output, but hangs when there is a lot. My code looks like:

相关标签:

7条回答

鱼传尺愫

2020-12-02 17:36
A lot of output is subjective so it's a little difficult to make a recommendation. If the amount of output is really large then you likely don't want to grab it all with a single read() call anyway. You may want to try writing the output to a file and then pull the data in incrementally like such:
```
f=file('data.out','w')
p = subprocess.Popen(cmd, shell=True, stdout=f, stderr=subprocess.PIPE)
errcode = p.wait()
f.close()
if errcode:
    errmess = p.stderr.read()
    log.error('cmd failed <%s>: %s' % (errcode,errmess))
for line in file('data.out'):
    #do something
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
抹茶落季

2020-12-02 17:38

You're doing blocking reads to two files; the first needs to complete before the second starts. If the application writes a lot to stderr, and nothing to stdout, then your process will sit waiting for data on stdout that isn't coming, while the program you're running sits there waiting for the stuff it wrote to stderr to be read (which it never will be--since you're waiting for stdout).

There are a few ways you can fix this.

The simplest is to not intercept stderr; leave stderr=None. Errors will be output to stderr directly. You can't intercept them and display them as part of your own message. For commandline tools, this is often OK. For other apps, it can be a problem.

Another simple approach is to redirect stderr to stdout, so you only have one incoming file: set stderr=STDOUT. This means you can't distinguish regular output from error output. This may or may not be acceptable, depending on how the application writes output.

The complete and complicated way of handling this is select (http://docs.python.org/library/select.html). This lets you read in a non-blocking way: you get data whenever data appears on either stdout or stderr. I'd only recommend this if it's really necessary. This probably doesn't work in Windows.

0 讨论(0)
发布评论:

提交评论
- 加载中...
悲&欢浪女

2020-12-02 17:46
I had the same problem. If you have to handle a large output, another good option could be to use a file for stdout and stderr, and pass those files per parameter.

Check the tempfile module in python: https://docs.python.org/2/library/tempfile.html.

Something like this might work
```
out = tempfile.NamedTemporaryFile(delete=False)
```
Then you would do:
```
Popen(... stdout=out,...)
```
Then you can read the file, and erase it later.
0 讨论(0)
发布评论:

提交评论
- 加载中...

萌比男神i

2020-12-02 17:46

Here is simple approach which captures both regular output plus error output, all within Python so limitations in stdout don't apply:

com_str = 'uname -a'
command = subprocess.Popen([com_str], stdout=subprocess.PIPE, shell=True)
(output, error) = command.communicate()
print output

Linux 3.11.0-20-generic SMP Fri May 2 21:32:55 UTC 2014

and

com_str = 'id'
command = subprocess.Popen([com_str], stdout=subprocess.PIPE, shell=True)
(output, error) = command.communicate()
print output

uid=1000(myname) gid=1000(mygrp) groups=1000(cell),0(root)

0 讨论(0)

忘掉有多难

2020-12-02 17:49

Glenn Maynard is right in his comment about deadlocks. However, the best way of solving this problem is two create two threads, one for stdout and one for stderr, which read those respective streams until exhausted and do whatever you need with the output.

The suggestion of using temporary files may or may not work for you depending on the size of output etc. and whether you need to process the subprocess' output as it is generated.

As Heikki Toivonen has suggested, you should look at the communicate method. However, this buffers the stdout/stderr of the subprocess in memory and you get those returned from the communicate call - this is not ideal for some scenarios. But the source of the communicate method is worth looking at.

Another example is in a package I maintain, python-gnupg, where the gpg executable is spawned via subprocess to do the heavy lifting, and the Python wrapper spawns threads to read gpg's stdout and stderr and consume them as data is produced by gpg. You may be able to get some ideas by looking at the source there, as well. Data produced by gpg to both stdout and stderr can be quite large, in the general case.

0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2020-12-02 17:56

You could try communicate and see if that solves your problem. If not, I'd redirect the output to a temporary file.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页