Decoding non standard characters to UTF 8 in Python

大城市里の小女人 提交于 2019-12-24 12:11:10

问题


I have a program that takes in byte-encoded text via a webhook in Django (written in Python). I have decoding from byte -> utf-8 working for normal letters, but it breaks when an apostrophe ( ' ) is sent in. I have this written to decode the text:

encoded = request.body
decoded = parse_qs(encoded)
body = decoded[b'body'][0].decode("utf-8")

And this is the error:

UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 5: ordinal not in range(128)

I'd like for it to successfully decode apostrophes. I'm also concerned it might break if an emoji is sent in, so I'd like to be able to escape emoji and random chars like ∫, but still preserve the real words in the message.


回答1:


parse_qs will work with decoded utf strings but chokes on non-ascii bytes. For example:

This fails:

a = b'restaurant_type=caf\xc3\xa9'
urllib.parse.parse_qs(a)
# > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3...etc

but this works okay:

a = b'restaurant_type=caf\xc3\xa9'
urllib.parse.parse_qs(a.decode())
# > {'restaurant_type': ['café']}

Is that what you are asking?



来源:https://stackoverflow.com/questions/45385288/decoding-non-standard-characters-to-utf-8-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!