Why I got messy characters while opening url using urllib2?

被刻印的时光 ゝ 提交于 2019-12-13 00:28:06

问题


Here's my code, you guys can also test it out. I always get messed-up characters instead of page source.

Header = {"User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)"}

Req = urllib2.Request("http://rlslog.net", None, Header)

Response = urllib2.urlopen(Req)

Html = Response.read()

print Html[:1000]

Normally Html should be page source, but it ended up to be tons of messed-up characters. Anybody knows why?

BTW: I'm on python 2.7


回答1:


As Bruce already suggested, it seems to be a problem with compression. The server returns gzip compressed content, but urllib2 does not support automatic gzip compression. In fact, the server is misbehaving in this case as far as I know: it should only compress the content if an Accept-encoding: gzip header is present (which you either provide yourself, or is automatically added by your client if it supports it).

So: either use a library that supports it automatically, like httplib2 (which I've tested with the page in question, and it works), or decompress yourself (see the answer to this SO question for how to do it, note that in the question the headers returned by the server are checked to see if the content is gzip compressed)




回答2:


You make your request with a user agent which supports on the fly compression. Are you sure that the output is not gzip compressed ? Try running it through zlib module and/or printing headers



来源:https://stackoverflow.com/questions/7231280/why-i-got-messy-characters-while-opening-url-using-urllib2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!