While porting code from Python 2 to Python 3, I run into this problem when reading UTF-8 text from standard input. In Python 2, this works fine:
for line in
Python 3 does not expect ASCII from sys.stdin
. It'll open stdin
in text mode and make an educated guess as to what encoding is used. That guess may come down to ASCII
, but that is not a given. See the sys.stdin documentation on how the codec is selected.
Like other file objects opened in text mode, the sys.stdin
object derives from the io.TextIOBase base class; it has a .buffer
attribute pointing to the underlying buffered IO instance (which in turn has a .raw
attribute).
Wrap the sys.stdin.buffer
attribute in a new io.TextIOWrapper() instance to specify a different encoding:
import io
import sys
input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')
Alternatively, set the PYTHONIOENCODING environment variable to the desired codec when running python.
From Python 3.7 onwards, you can also reconfigure the existing std* wrappers, provided you do it at the start (before any data has been read):
# Python 3.7 and newer
sys.stdin.reconfigure(encoding='utf-8')