Decoding problems in Django and lxml

偶尔善良 提交于 2019-12-10 18:02:10

问题


I have a strange problem with lxml when using the deployed version of my Django application. I use lxml to parse another HTML page which I fetch from my server. This works perfectly well on my development server on my own computer, but for some reason it gives me UnicodeDecodeError on the server.

('utf8', "\x85why hello there!", 0, 1, 'unexpected code byte')

I have made sure that Apache (with mod_python) runs with LANG='en_US.UTF-8'.

I've tried googling for this problem and tried different approaches to decoding the string correctly, but I can't figure it out.

In your answer, you may assume that my string is called hello or something.


回答1:


"\x85why hello there!" is not a utf-8 encoded string. You should try decoding the webpage before passing it to lxml. Check what encoding it uses by looking at the http headers when you fetch the page maybe you find the problem there.




回答2:


Doesn't syntax such as u"\x85why hello there!" help?

You may find the following resources from the official Python documentation helpful:

  • Python introduction, Unicode Strings
  • Sequence Types — str, unicode, list, tuple, buffer, xrange



回答3:


Since modifying site.py is not an ideal solution try this at the start of your program:

import sys
reload(sys)
sys.setdefaultencoding("utf-8")


来源:https://stackoverflow.com/questions/808275/decoding-problems-in-django-and-lxml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!