问题
I've tried io
, repr()
etc, they don't work!
Problem inputting å
(\xe5
):
(None of these work)
import sys
print(sys.stdin.read(1))
sys.stdin = io.TextIOWrapper(sys.stdin.detach(), errors='replace', encoding='iso-8859-1', newline='\n')
print(sys.stdin.read(1))
x = sys.stdin.buffer.read(1)
print(x.decode('utf-8'))
They all give me roughly UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 0: unexpected end of data
Also tried starting Python with: export PYTHONIOENCODING=utf-8
doesn't work either.
Now, here's where i'm at:
import sys, codecs
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
sys.stdin = codecs.getwriter("utf-8")(sys.stdin.detach())
x = sys.stdin.read(1)
print(x.decode('utf-8', 'replace'))
This gives me: �
It's close...
How can i take a \xe5
and turn it into å
in my console?
Without it breaking input()
as well, because this solution breaks it.
Note: I know this has been asked before, but non of those solve it.. especially not io
Some info of my system
os.environ['LANG'] == 'C'
sys.getdefaultencoding() == 'utf-8'
sys.stdout.encoding == 'ANSI_X3.4-1968'
sys.stdin.encoding == 'ANSI_X3.4-1968'
My os: ArchLinux
running xterm
Running locale -a
gives me: C | POSIX | sv_SE.utf8
I've followed these:
- Python 3: How to specify stdin encoding
- http://python-notes.curiousefficiency.org/en/latest/python3/binary_protocols.html
- http://wolfprojects.altervista.org/talks/unicode-and-python-3/
- http://getpython3.com/diveintopython3/strings.html
- Python 3 - Encode/Decode vs Bytes/Str
- How to set sys.stdout encoding in Python 3?
- http://docs.python.org/3.0/howto/unicode.html
(and a few 50 more)
Solution (sort of, still breaks input()
)
sys.stdout = codecs.getwriter("latin-1")(sys.stdout.detach())
sys.stdin = codecs.getwriter("latin-1")(sys.stdin.detach())
x = sys.stdin.read(1)
print(x.decode('latin-1', 'replace'))
回答1:
You are running this in xterm
, which does not support UTF-8 by default. Run it as xterm -u8
or use uxterm
to fix that.
The other way to work around that, is to use a different locale; set your locale to Latin-1 for example:
export LANG=sv_SE.ISO-8859-1
but then you are limited to 256 codepoints, versus the full range (several million) of the Unicode standard.
Note that Python 2 never decoded the input; writing out what you read from the terminal will look fine because the raw bytes you read are interpreted by the terminal in the same locale; reading and writing Latin-1 bytes works just fine. That's not quite the same as processing Unicode data, however.
回答2:
(sorry martijn, you're awesome but) I just hate when you need to circumvent an issue and blame it on something instead of fixing it with programming.
And here's the solution to the poison that is Python3:
import sys, codecs
sys.stdout = codecs.getwriter("latin-1")(sys.stdout.detach())
sys.stdin = codecs.getwriter("latin-1")(sys.stdin.detach())
sys.stdout.write(sys.stdin.read(1).decode('latin-1', 'replace'))
This does not only make you choose/match against your terminals "encoding", it actually requires no outside influence (such as export LANG=sv_SE.ISO-8859-1
).
The only downside:
input('something: ')
Will break, fix for that is:
# Since it's bad practice to name function the
# same as __builtins__, we'll go ahead and call it something
# we're used to but isn't in use any more.
def raw_input(txt):
sys.stdout.write(txt)
sys.stdout.flush()
sys.stdin.flush()
return sys.stdin.readline().strip()
And all is well in paradise, i LOVE to stick it to the man (Python3)..
A big thanks to Martijn for telling why and that in fact the data is latin-1!
来源:https://stackoverflow.com/questions/18260859/python3-ascii-utf-8-iso-8859-1-cant-decode-byte-0xe5-swedish-characters