I am using the subprocess
module to start a process from python. I want to be able to access the output (stdout, stderr) as soon as it is writ
At issue here is buffering by the child process. Your subprocess
code already works as well as it could, but if you have a child process that buffers its output then there is nothing that subprocess
pipes can do about this.
I cannot stress this enough: the buffering delays you see are the responsibility of the child process, and how it handles buffering has nothing to do with the subprocess
module.
You already discovered this; this is why adding sys.stdout.flush()
in the child process makes the data show up sooner; the child process uses buffered I/O (a memory cache to collect written data) before sending it down the sys.stdout
pipe 1.
Python automatically uses line-buffering when sys.stdout
is connected to a terminal; the buffer flushes whenever a newline is written. When using pipes, sys.stdout
is not connected to a terminal and a fixed-size buffer is used instead.
Now, the Python child process can be told to handle buffering differently; you can set an environment variable or use a command-line switch to alter how it uses buffering for sys.stdout
(and sys.stderr
and sys.stdin
). From the Python command line documentation:
-u
Force stdin, stdout and stderr to be totally unbuffered. On systems where it matters, also put stdin, stdout and stderr in binary mode.[...]
PYTHONUNBUFFERED
If this is set to a non-empty string it is equivalent to specifying the -u option.
If you are dealing with child processes that are not Python processes and you experience buffering issues with those, you'll need to look at the documentation of those processes to see if they can be switched to use unbuffered I/O, or be switched to more desirable buffering strategies.
One thing you could try is to use the script -c command to provide a pseudo-terminal to a child process. This is a POSIX tool, however, and is probably not available on Windows.
1. It should be noted that when flushing a pipe, no data is 'written to disk'; all data remains entirely in memory here. I/O buffers are just memory caches to get the best performance out of I/O by handling data in larger chunks. Only if you have a disk-based file object would fileobj.flush()
cause it to push any buffers to the OS, which usually means that data is indeed written to disk.