python 3: reading bytes from stdin pipe with readahead

前端 未结 3 1751
长情又很酷
长情又很酷 2021-01-11 18:08

i want to read bytes. sys.stdin is opened in textmode, yet it has a buffer that can be used to read bytes: sys.stdin.buffer.

my problem is

相关标签:
3条回答
  • 2021-01-11 18:34

    user4815162342's solution, while extremely useful, appears to have an issue in that it differs from the current behaviour of the io.BufferedReader peek method.

    The builtin method will return the same data (starting from the current read position) for sequential peek() calls.

    user4815162342's solution will return sequential chunks of data for each sequential peek call. This implies the user must wrap peek again to concatenate the output if they wish to use the same data more than once.

    Here is the fix to return builtin behaviour:

    def _buffered(self):
        oldpos = self.buf.tell()
        data = self.buf.read()
        self.buf.seek(oldpos)
        return data
    
    def peek(self, size):
        buf = self._buffered()[:size]
        if len(buf) < size:
            contents = self.fileobj.read(size - len(buf))
            self._append_to_buf(contents)
            return self._buffered()
        return buf
    

    See the full version here

    There are other optimisations that could be applied, e.g. removal of previously buffered data upon a read call that exhausts the buffer. The current implementation leaves any peeked data in the buffer, but that data is inaccessible.

    0 讨论(0)
  • 2021-01-11 18:45

    The exception doesn't come from Python, but from the operating system, which doesn't allow seeking on pipes. (If you redirect output from a regular pipe, it can be seeked, even though it's standard input.) This is why you get the error in one case and not in the other, even though the classes are the same.

    The classic Python 2 solution for readahead would be to wrap the stream in your own stream implementation that implements readahead:

    class Peeker(object):
        def __init__(self, fileobj):
            self.fileobj = fileobj
            self.buf = cStringIO.StringIO()
    
        def _append_to_buf(self, contents):
            oldpos = self.buf.tell()
            self.buf.seek(0, os.SEEK_END)
            self.buf.write(contents)
            self.buf.seek(oldpos)
    
        def peek(self, size):
            contents = self.fileobj.read(size)
            self._append_to_buf(contents)
            return contents
    
        def read(self, size=None):
            if size is None:
                return self.buf.read() + self.fileobj.read()
            contents = self.buf.read(size)
            if len(contents) < size:
                contents += self.fileobj.read(size - len(contents))
            return contents
    
        def readline(self):
            line = self.buf.readline()
            if not line.endswith('\n'):
                line += self.fileobj.readline()
            return line
    
    sys.stdin = Peeker(sys.stdin)
    

    In Python 3 supporting the full sys.stdin while peeking the undecoded stream is complicated—one would wrap stdin.buffer as shown above, then instantiate a new TextIOWrapper over your peekable stream, and install that TextIOWrapper as sys.stdin.

    However, since you only need to peek at sys.stdin.buffer, the above code will work just fine, after changing cStringIO.StringIO to io.BytesIO and '\n' to b'\n'.

    0 讨论(0)
  • 2021-01-11 18:45

    Try this:

    import sys
    
    ssb = sys.stdin.buffer.read(1)
    if ssb == b'h':
        print(ssb+sys.stdin.buffer.read())
    

    Echo a string:

    a@fuhq:~$ echo 'hi' | python3 buf_test.py 
    b'hi\n'
    

    Redirect a file:

    a@fuhq:~$ cat hi.text
    hi
    a@fuhq:~$ python3 buf_test.py   <  hi.text
    b'hi\n'
    
    0 讨论(0)
提交回复
热议问题