问题
I have a program that takes in byte-encoded text via a webhook in Django (written in Python). I have decoding from byte -> utf-8 working for normal letters, but it breaks when an apostrophe ( ' ) is sent in. I have this written to decode the text:
encoded = request.body
decoded = parse_qs(encoded)
body = decoded[b'body'][0].decode("utf-8")
And this is the error:
UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 5: ordinal not in range(128)
I'd like for it to successfully decode apostrophes. I'm also concerned it might break if an emoji is sent in, so I'd like to be able to escape emoji and random chars like ∫, but still preserve the real words in the message.
回答1:
parse_qs
will work with decoded utf strings but chokes on non-ascii bytes. For example:
This fails:
a = b'restaurant_type=caf\xc3\xa9'
urllib.parse.parse_qs(a)
# > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3...etc
but this works okay:
a = b'restaurant_type=caf\xc3\xa9'
urllib.parse.parse_qs(a.decode())
# > {'restaurant_type': ['café']}
Is that what you are asking?
来源:https://stackoverflow.com/questions/45385288/decoding-non-standard-characters-to-utf-8-in-python