How to read inputs from stdin and enforce an encoding?

问题

The goal is to continuously read from stdin and enforce utf8 in both Python2 and Python3.

I've tried solutions from:

Writing bytes to standard output in a way compatible with both, python2 and python3
Python 3: How to specify stdin encoding

I've tried:

#!/usr/bin/env python

from __future__ import print_function, unicode_literals
import io
import sys

# Supports Python2 read from stdin and Python3 read from stdin.buffer
# https://stackoverflow.com/a/23932488/610569
user_input = getattr(sys.stdin, 'buffer', sys.stdin)


# Enforcing utf-8 in Python3
# https://stackoverflow.com/a/16549381/610569
with io.TextIOWrapper(user_input, encoding='utf-8') as fin:
    for line in fin:
        # Reads the input line by line
        # and do something, for e.g. just print line.
        print(line)

The code works in Python3 but in Python2, the TextIOWrapper doesn't have a read function and it throws:

Traceback (most recent call last):
  File "testfin.py", line 12, in <module>
    with io.TextIOWrapper(user_input, encoding='utf-8') as fin:
AttributeError: 'file' object has no attribute 'readable'

That's because in Python the user_input , i.e. sys.stdin.buffer is an _io.BufferedReader object and its attribute has readable:

<class '_io.BufferedReader'>

['__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', '_dealloc_warn', '_finalizing', 'close', 'closed', 'detach', 'fileno', 'flush', 'isatty', 'mode', 'name', 'peek', 'raw', 'read', 'read1', 'readable', 'readinto', 'readinto1', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']

While in Python2 the user_input is a file object and its attributes don't have readable:

<type 'file'>

['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines']

回答1:

If you don't need a fully-fledged io.TextIOWrapper, but just a decoded stream for reading, you can use codecs.getreader() to create a decoding wrapper:

reader = codecs.getreader('utf8')(user_input)
for line in reader:
    # do whatever you need...
    print(line)

codecs.getreader('utf8') creates a factory for a codecs.StreamReader, which is then instantiated using the original stream. I'm not sure the StreamReader supports the with context, but this might not be strictly necessary (there's no need to close STDIN after reading, I guess...).

I've successfully used this solution in situations where the underlying stream only offers a very limited interface.

Update (2nd version)

From the comments, it became clear that you actually need an io.TextIOWrapper to have proper line buffering etc. in interactive mode; codecs.StreamReader only works for piped input and the like.

Using this answer, I was able to get interactive input work properly:

#!/usr/bin/env python
# coding: utf8

from __future__ import print_function, unicode_literals
import io
import sys

user_input = getattr(sys.stdin, 'buffer', sys.stdin)

with io.open(user_input.fileno(), encoding='utf8') as f:
    for line in f:
        # do whatever you need...
        print(line)

This creates an io.TextIOWrapper with enforced encoding from the binary STDIN buffer.

回答2:

Have you tried forcing utf-8 encoding in python as follow :

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

来源：https://stackoverflow.com/questions/47425695/how-to-read-inputs-from-stdin-and-enforce-an-encoding

标签

python

file

utf-8

stdin