How do I obtain the output from a program that uses screen redrawing for use in a terminal screen scraper?

笑着哭i 提交于 2019-12-24 03:23:46

问题


I am trying to obtain the output of a full-screen terminal program that uses redrawing escape codes to present data, and which requires a tty (or pty) to run.

The basic procedure a human would follow is:

  1. Start the program in a terminal.
  2. The program uses redrawing to display and update various fields of data.
  3. The human waits until the display is consistent (possibly using cues such as "it's not flickering" or "it's been 0.5s since the last update").
  4. The human looks at the fields in certain positions and remembers or records the data.
  5. The human exits the program.
  6. The human then performs actions outside the program based on that data.

I would like to automate this process. Steps 4 and 5 can be done in either order. While the perfectionist in me is worried about self-consistency of the screen state, I admit I'm not really sure how to properly define this (except perhaps to use "it's been more than a certain timeout period since the last update").

It seems that using pty and subprocess followed by some sort of screen scraper is one possible way to do this, but I'm unclear on exactly how to use them all together, and what hazards exist with some of the lower level objects I'm using.

Consider this program:

#!/usr/bin/env python2
import os
import pty
import subprocess
import time

import pexpect.ANSI

# Psuedo-terminal FDs
fd_master, fd_slave = pty.openpty()

# Start 'the_program'
the_proc = subprocess.Popen(['the_program'], stdin=fd_master, stdout=fd_slave, stderr=fd_slave)

# Just kill it after a couple of seconds
time.sleep(2)
the_proc.terminate()

# Read output into a buffer
output_buffer = b''
read_size = None

while (read_size is None) or (read_size > 0):
    chunk = os.read(fd_master, 1024)
    output_buffer += chunk
    read_size = len(chunk)

print("output buffer size: {:d}".format(len(output_buffer)))

# Feed output to screen scraper
ansi_term = pexpect.ANSI.ANSI(24, 80)
ansi_term.write(output_buffer)

# Parse presented data... 

One problem is that the os.read() call blocks, always. I am also wondering if there's a better way to obtain the pty output for further use. Specifically:

  1. Is there a way to do this (or parts of it) with higher-level code? I can't just use subprocess.PIPE for my Popen call, because then the target program won't work. But can I wrap those file descriptors in something with some more convenient methods to do I/O?
  2. If not, how do I avoid always blocking on the os.read call? I'm more used to file-like objects where read() always returns, and just returns an empty string if the end of the stream is reached. Here, os.read eventually blocks no matter what.
  3. I'm wary of getting this script to "just work" without being aware of potential hazards (eg. race conditions that show up one time in a thousand). What else do I need to be aware of?

I'm also open to the idea that using pty and subprocess in the first place is not the best way to do this.


回答1:


If the program does not generate much output; the simplest way is to use pexpect.run() to get its output via pty:

import pexpect # $ pip install pexpect

output, status = pexpect.run('top', timeout=2, withexitstatus=1)

You could detect whether the output is "settled down" by comparing it with the previous output:

import pexpect # $ pip install pexpect

def every_second(d, last=[None]):
    current = d['child'].before
    if last[0] == current: # "settled down"
        raise pexpect.TIMEOUT(None) # exit run
    last[0] = current

output, status =  pexpect.run('top', timeout=1, withexitstatus=1,
                              events={pexpect.TIMEOUT: every_second})

You could use a regex that matches a recurrent pattern in the output instead of the timeout. The intent is to determine when the output is "settled down".

Here's for comparison the code that uses subprocess and pty modules directly:

#!/usr/bin/env python
"""Start process; wait 2 seconds; kill the process; print all process output."""
import errno
import os
import pty
import select
from subprocess import Popen, STDOUT
try:
    from time import monotonic as timer
except ImportError:
    from time import time as timer

output = []
master_fd, slave_fd = pty.openpty() #XXX add cleanup on exception
p = Popen(["top"], stdin=slave_fd, stdout=slave_fd, stderr=STDOUT,
          close_fds=True)
os.close(slave_fd)
endtime = timer() + 2 # stop in 2 seconds
while True:
    delay = endtime - timer()
    if delay <= 0: # timeout
        break
    if select.select([master_fd], [], [], delay)[0]:
        try:
            data = os.read(master_fd, 1024)
        except OSError as e: #NOTE: no need for IOError here
            if e.errno != errno.EIO:
                raise
            break # EIO means EOF on some systems
        else:
            if not data: # EOF
                break
            output.append(data)
os.close(master_fd)
p.terminate()
returncode = p.wait()
print([returncode, b''.join(output)])

Note:

  • all three standard streams in the child process use slave_fd unlike the code in your answer that uses master_fd for stdin
  • the code reads output while the process is still running. It allows to accept a large output (more than a size of a single buffer in kernel)
  • the code does not loose data on EIO error (means EOF here)

Based on Python subprocess readlines() hangs.




回答2:


You can use pexpect to do this. Use the run() function to obtain the data, and see the included VT100 emulator (or pyte) for rendering it.

Using the utility top as an example:

import time
import pexpect
import pexpect.ANSI

# Start 'top' and quit after a couple of seconds
output_buffer = pexpect.run('top', timeout=2)

# For continuous reading/interaction, you would need to use the "events"
# arg, threading, or a framework for asynchronous communication.

ansi_term = pexpect.ANSI.ANSI(24, 80)
ansi_term.write(output_buffer)
print(str(ansi_term))

(Note that there is a bug resulting in extra line spacings sometimes.)



来源:https://stackoverflow.com/questions/29057549/how-do-i-obtain-the-output-from-a-program-that-uses-screen-redrawing-for-use-in

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!