问题
The goal is to continuously read from stdin
and enforce utf8
in both Python2 and Python3.
I've tried solutions from:
- Writing bytes to standard output in a way compatible with both, python2 and python3
- Python 3: How to specify stdin encoding
I've tried:
#!/usr/bin/env python
from __future__ import print_function, unicode_literals
import io
import sys
# Supports Python2 read from stdin and Python3 read from stdin.buffer
# https://stackoverflow.com/a/23932488/610569
user_input = getattr(sys.stdin, 'buffer', sys.stdin)
# Enforcing utf-8 in Python3
# https://stackoverflow.com/a/16549381/610569
with io.TextIOWrapper(user_input, encoding='utf-8') as fin:
for line in fin:
# Reads the input line by line
# and do something, for e.g. just print line.
print(line)
The code works in Python3 but in Python2, the TextIOWrapper doesn't have a read function and it throws:
Traceback (most recent call last):
File "testfin.py", line 12, in <module>
with io.TextIOWrapper(user_input, encoding='utf-8') as fin:
AttributeError: 'file' object has no attribute 'readable'
That's because in Python the user_input
, i.e. sys.stdin.buffer
is an
_io.BufferedReader
object and its attribute has readable
:
<class '_io.BufferedReader'>
['__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', '_dealloc_warn', '_finalizing', 'close', 'closed', 'detach', 'fileno', 'flush', 'isatty', 'mode', 'name', 'peek', 'raw', 'read', 'read1', 'readable', 'readinto', 'readinto1', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']
While in Python2 the user_input
is a file object and its attributes don't have readable
:
<type 'file'>
['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines']
回答1:
If you don't need a fully-fledged io.TextIOWrapper
, but just a decoded stream for reading, you can use codecs.getreader()
to create a decoding wrapper:
reader = codecs.getreader('utf8')(user_input)
for line in reader:
# do whatever you need...
print(line)
codecs.getreader('utf8')
creates a factory for a codecs.StreamReader
, which is then instantiated using the original stream.
I'm not sure the StreamReader
supports the with
context, but this might not be strictly necessary (there's no need to close STDIN after reading, I guess...).
I've successfully used this solution in situations where the underlying stream only offers a very limited interface.
Update (2nd version)
From the comments, it became clear that you actually need an io.TextIOWrapper
to have proper line buffering etc. in interactive mode; codecs.StreamReader
only works for piped input and the like.
Using this answer, I was able to get interactive input work properly:
#!/usr/bin/env python
# coding: utf8
from __future__ import print_function, unicode_literals
import io
import sys
user_input = getattr(sys.stdin, 'buffer', sys.stdin)
with io.open(user_input.fileno(), encoding='utf8') as f:
for line in f:
# do whatever you need...
print(line)
This creates an io.TextIOWrapper
with enforced encoding from the binary STDIN buffer.
回答2:
Have you tried forcing utf-8 encoding in python as follow :
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
来源:https://stackoverflow.com/questions/47425695/how-to-read-inputs-from-stdin-and-enforce-an-encoding