live output from subprocess command

后端 未结 16 1154
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-11-22 08:16

I\'m using a python script as a driver for a hydrodynamics code. When it comes time to run the simulation, I use subprocess.Popen to run the code, collect the

相关标签:
16条回答
  • 2020-11-22 09:20

    I think that the subprocess.communicate method is a bit misleading: it actually fills the stdout and stderr that you specify in the subprocess.Popen.

    Yet, reading from the subprocess.PIPE that you can provide to the subprocess.Popen's stdout and stderr parameters will eventually fill up OS pipe buffers and deadlock your app (especially if you've multiple processes/threads that must use subprocess).

    My proposed solution is to provide the stdout and stderr with files - and read the files' content instead of reading from the deadlocking PIPE. These files can be tempfile.NamedTemporaryFile() - which can also be accessed for reading while they're being written into by subprocess.communicate.

    Below is a sample usage:

            try:
                with ProcessRunner(('python', 'task.py'), env=os.environ.copy(), seconds_to_wait=0.01) as process_runner:
                    for out in process_runner:
                        print(out)
            catch ProcessError as e:
                print(e.error_message)
                raise
    

    And this is the source code which is ready to be used with as many comments as I could provide to explain what it does:

    If you're using python 2, please make sure to first install the latest version of the subprocess32 package from pypi.

    
    import os
    import sys
    import threading
    import time
    import tempfile
    import logging
    
    if os.name == 'posix' and sys.version_info[0] < 3:
        # Support python 2
        import subprocess32 as subprocess
    else:
        # Get latest and greatest from python 3
        import subprocess
    
    logger = logging.getLogger(__name__)
    
    
    class ProcessError(Exception):
        """Base exception for errors related to running the process"""
    
    
    class ProcessTimeout(ProcessError):
        """Error that will be raised when the process execution will exceed a timeout"""
    
    
    class ProcessRunner(object):
        def __init__(self, args, env=None, timeout=None, bufsize=-1, seconds_to_wait=0.25, **kwargs):
            """
            Constructor facade to subprocess.Popen that receives parameters which are more specifically required for the
            Process Runner. This is a class that should be used as a context manager - and that provides an iterator
            for reading captured output from subprocess.communicate in near realtime.
    
            Example usage:
    
    
            try:
                with ProcessRunner(('python', task_file_path), env=os.environ.copy(), seconds_to_wait=0.01) as process_runner:
                    for out in process_runner:
                        print(out)
            catch ProcessError as e:
                print(e.error_message)
                raise
    
            :param args: same as subprocess.Popen
            :param env: same as subprocess.Popen
            :param timeout: same as subprocess.communicate
            :param bufsize: same as subprocess.Popen
            :param seconds_to_wait: time to wait between each readline from the temporary file
            :param kwargs: same as subprocess.Popen
            """
            self._seconds_to_wait = seconds_to_wait
            self._process_has_timed_out = False
            self._timeout = timeout
            self._process_done = False
            self._std_file_handle = tempfile.NamedTemporaryFile()
            self._process = subprocess.Popen(args, env=env, bufsize=bufsize,
                                             stdout=self._std_file_handle, stderr=self._std_file_handle, **kwargs)
            self._thread = threading.Thread(target=self._run_process)
            self._thread.daemon = True
    
        def __enter__(self):
            self._thread.start()
            return self
    
        def __exit__(self, exc_type, exc_val, exc_tb):
            self._thread.join()
            self._std_file_handle.close()
    
        def __iter__(self):
            # read all output from stdout file that subprocess.communicate fills
            with open(self._std_file_handle.name, 'r') as stdout:
                # while process is alive, keep reading data
                while not self._process_done:
                    out = stdout.readline()
                    out_without_trailing_whitespaces = out.rstrip()
                    if out_without_trailing_whitespaces:
                        # yield stdout data without trailing \n
                        yield out_without_trailing_whitespaces
                    else:
                        # if there is nothing to read, then please wait a tiny little bit
                        time.sleep(self._seconds_to_wait)
    
                # this is a hack: terraform seems to write to buffer after process has finished
                out = stdout.read()
                if out:
                    yield out
    
            if self._process_has_timed_out:
                raise ProcessTimeout('Process has timed out')
    
            if self._process.returncode != 0:
                raise ProcessError('Process has failed')
    
        def _run_process(self):
            try:
                # Start gathering information (stdout and stderr) from the opened process
                self._process.communicate(timeout=self._timeout)
                # Graceful termination of the opened process
                self._process.terminate()
            except subprocess.TimeoutExpired:
                self._process_has_timed_out = True
                # Force termination of the opened process
                self._process.kill()
    
            self._process_done = True
    
        @property
        def return_code(self):
            return self._process.returncode
    
    
    
    
    0 讨论(0)
  • 2020-11-22 09:22

    Why not set stdout directly to sys.stdout? And if you need to output to a log as well, then you can simply override the write method of f.

    import sys
    import subprocess
    
    class SuperFile(open.__class__):
    
        def write(self, data):
            sys.stdout.write(data)
            super(SuperFile, self).write(data)
    
    f = SuperFile("log.txt","w+")       
    process = subprocess.Popen(command, stdout=f, stderr=f)
    
    0 讨论(0)
  • 2020-11-22 09:22

    Similar to previous answers but the following solution worked for for me on windows using Python3 to provide a common method to print and log in realtime (getting-realtime-output-using-python):

    def print_and_log(command, logFile):
        with open(logFile, 'wb') as f:
            command = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)
    
            while True:
                output = command.stdout.readline()
                if not output and command.poll() is not None:
                    f.close()
                    break
                if output:
                    f.write(output)
                    print(str(output.strip(), 'utf-8'), flush=True)
            return command.poll()
    
    0 讨论(0)
  • 2020-11-22 09:23

    Executive Summary (or "tl;dr" version): it's easy when there's at most one subprocess.PIPE, otherwise it's hard.

    It may be time to explain a bit about how subprocess.Popen does its thing.

    (Caveat: this is for Python 2.x, although 3.x is similar; and I'm quite fuzzy on the Windows variant. I understand the POSIX stuff much better.)

    The Popen function needs to deal with zero-to-three I/O streams, somewhat simultaneously. These are denoted stdin, stdout, and stderr as usual.

    You can provide:

    • None, indicating that you don't want to redirect the stream. It will inherit these as usual instead. Note that on POSIX systems, at least, this does not mean it will use Python's sys.stdout, just Python's actual stdout; see demo at end.
    • An int value. This is a "raw" file descriptor (in POSIX at least). (Side note: PIPE and STDOUT are actually ints internally, but are "impossible" descriptors, -1 and -2.)
    • A stream—really, any object with a fileno method. Popen will find the descriptor for that stream, using stream.fileno(), and then proceed as for an int value.
    • subprocess.PIPE, indicating that Python should create a pipe.
    • subprocess.STDOUT (for stderr only): tell Python to use the same descriptor as for stdout. This only makes sense if you provided a (non-None) value for stdout, and even then, it is only needed if you set stdout=subprocess.PIPE. (Otherwise you can just provide the same argument you provided for stdout, e.g., Popen(..., stdout=stream, stderr=stream).)

    The easiest cases (no pipes)

    If you redirect nothing (leave all three as the default None value or supply explicit None), Pipe has it quite easy. It just needs to spin off the subprocess and let it run. Or, if you redirect to a non-PIPE—an int or a stream's fileno()—it's still easy, as the OS does all the work. Python just needs to spin off the subprocess, connecting its stdin, stdout, and/or stderr to the provided file descriptors.

    The still-easy case: one pipe

    If you redirect only one stream, Pipe still has things pretty easy. Let's pick one stream at a time and watch.

    Suppose you want to supply some stdin, but let stdout and stderr go un-redirected, or go to a file descriptor. As the parent process, your Python program simply needs to use write() to send data down the pipe. You can do this yourself, e.g.:

    proc = subprocess.Popen(cmd, stdin=subprocess.PIPE)
    proc.stdin.write('here, have some data\n') # etc
    

    or you can pass the stdin data to proc.communicate(), which then does the stdin.write shown above. There is no output coming back so communicate() has only one other real job: it also closes the pipe for you. (If you don't call proc.communicate() you must call proc.stdin.close() to close the pipe, so that the subprocess knows there is no more data coming through.)

    Suppose you want to capture stdout but leave stdin and stderr alone. Again, it's easy: just call proc.stdout.read() (or equivalent) until there is no more output. Since proc.stdout() is a normal Python I/O stream you can use all the normal constructs on it, like:

    for line in proc.stdout:
    

    or, again, you can use proc.communicate(), which simply does the read() for you.

    If you want to capture only stderr, it works the same as with stdout.

    There's one more trick before things get hard. Suppose you want to capture stdout, and also capture stderr but on the same pipe as stdout:

    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    

    In this case, subprocess "cheats"! Well, it has to do this, so it's not really cheating: it starts the subprocess with both its stdout and its stderr directed into the (single) pipe-descriptor that feeds back to its parent (Python) process. On the parent side, there's again only a single pipe-descriptor for reading the output. All the "stderr" output shows up in proc.stdout, and if you call proc.communicate(), the stderr result (second value in the tuple) will be None, not a string.

    The hard cases: two or more pipes

    The problems all come about when you want to use at least two pipes. In fact, the subprocess code itself has this bit:

    def communicate(self, input=None):
        ...
        # Optimization: If we are only using one pipe, or no pipe at
        # all, using select() or threads is unnecessary.
        if [self.stdin, self.stdout, self.stderr].count(None) >= 2:
    

    But, alas, here we've made at least two, and maybe three, different pipes, so the count(None) returns either 1 or 0. We must do things the hard way.

    On Windows, this uses threading.Thread to accumulate results for self.stdout and self.stderr, and has the parent thread deliver self.stdin input data (and then close the pipe).

    On POSIX, this uses poll if available, otherwise select, to accumulate output and deliver stdin input. All this runs in the (single) parent process/thread.

    Threads or poll/select are needed here to avoid deadlock. Suppose, for instance, that we've redirected all three streams to three separate pipes. Suppose further that there's a small limit on how much data can be stuffed into to a pipe before the writing process is suspended, waiting for the reading process to "clean out" the pipe from the other end. Let's set that small limit to a single byte, just for illustration. (This is in fact how things work, except that the limit is much bigger than one byte.)

    If the parent (Python) process tries to write several bytes—say, 'go\n'to proc.stdin, the first byte goes in and then the second causes the Python process to suspend, waiting for the subprocess to read the first byte, emptying the pipe.

    Meanwhile, suppose the subprocess decides to print a friendly "Hello! Don't Panic!" greeting. The H goes into its stdout pipe, but the e causes it to suspend, waiting for its parent to read that H, emptying the stdout pipe.

    Now we're stuck: the Python process is asleep, waiting to finish saying "go", and the subprocess is also asleep, waiting to finish saying "Hello! Don't Panic!".

    The subprocess.Popen code avoids this problem with threading-or-select/poll. When bytes can go over the pipes, they go. When they can't, only a thread (not the whole process) has to sleep—or, in the case of select/poll, the Python process waits simultaneously for "can write" or "data available", writes to the process's stdin only when there is room, and reads its stdout and/or stderr only when data are ready. The proc.communicate() code (actually _communicate where the hairy cases are handled) returns once all stdin data (if any) have been sent and all stdout and/or stderr data have been accumulated.

    If you want to read both stdout and stderr on two different pipes (regardless of any stdin redirection), you will need to avoid deadlock too. The deadlock scenario here is different—it occurs when the subprocess writes something long to stderr while you're pulling data from stdout, or vice versa—but it's still there.


    The Demo

    I promised to demonstrate that, un-redirected, Python subprocesses write to the underlying stdout, not sys.stdout. So, here is some code:

    from cStringIO import StringIO
    import os
    import subprocess
    import sys
    
    def show1():
        print 'start show1'
        save = sys.stdout
        sys.stdout = StringIO()
        print 'sys.stdout being buffered'
        proc = subprocess.Popen(['echo', 'hello'])
        proc.wait()
        in_stdout = sys.stdout.getvalue()
        sys.stdout = save
        print 'in buffer:', in_stdout
    
    def show2():
        print 'start show2'
        save = sys.stdout
        sys.stdout = open(os.devnull, 'w')
        print 'after redirect sys.stdout'
        proc = subprocess.Popen(['echo', 'hello'])
        proc.wait()
        sys.stdout = save
    
    show1()
    show2()
    

    When run:

    $ python out.py
    start show1
    hello
    in buffer: sys.stdout being buffered
    
    start show2
    hello
    

    Note that the first routine will fail if you add stdout=sys.stdout, as a StringIO object has no fileno. The second will omit the hello if you add stdout=sys.stdout since sys.stdout has been redirected to os.devnull.

    (If you redirect Python's file-descriptor-1, the subprocess will follow that redirection. The open(os.devnull, 'w') call produces a stream whose fileno() is greater than 2.)

    0 讨论(0)
提交回复
热议问题