Best Practices for Python UnicodeDecodeError

青春壹個敷衍的年華 提交于 2019-11-29 00:29:00

If you have influence on it, this is the painless way:

  • know your input encoding (or decode with ignore) and decode(encoding) the data as soon as it hits your app
  • work internally only with unicode (u'something' is unicode), also in the database
  • for rendering, export etc, anytime it leaves your app, encode('utf-8') the data

this might not be a viable option for you, but let me say that a big number of encoding-related errors vanish when using python 3, just because the separation between unicode strings and byte objects has been made so much clearer. when i have to use python 2, i opt for version 2.6, where you can declare from future import unicode_literals. disbelievers should actually read the link you posted, as it points out some subtleties with Python's en/decoding behavior that fortunately vanished in Python 3.

you say

I do not control the languages or language of choice. My site supports international languages and along with English. I have feed aggregation which generally not bother about unicode/ascii/utf-8

well, whatever you choose to do, it is clear you do not want your web application to crash just because some dænish bløgger whose feeds you consume chose to encode their posts in an obscure scandinavian encoding scheme. the underlying problem is relevant for all web applications since URLs do not carry encoding information, and because you never know what byte sequences a malicious user might want to send you. in this case i do what i call 'safe chain-decoding': i try to decode as utf-8 first, and if that should fail, try again using cp1252. if that fails, i discard the request (HTTP 404) or something similar.

you mention you process feeds and ¿you? ¿the feeds? do not 'bother' about unicode and encodings. could you clarify that statement? it completely evades me how one can successfully build a site that carries text in multiple languages and not care about encodings. clearly using ascii-only will not carry you very far.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!