Python urllib2 parse html problem

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-11 07:39:37

问题


I am using mechanize to parse html of website, but with this website i got strange result.

from mechanize import Browser
br = Browser()
r = br.open("http://www.heavenplaza.com")
result = r.read()

result is something which i can not understand. you can see here: http://paste2.org/p/1556077

Anyone can have some method to get that website HTML? with mechanize or urllib.

Thanks


回答1:


import urllib2, StringIO, gzip
f = urllib2.urlopen("http://www.heavenplaza.com")
data = StringIO.StringIO(f.read())
gzipper = gzip.GzipFile(fileobj=data)
print gzipper.read()



回答2:


I quickly checked the script in the console and the site was returning crap. You probably need to spoof your HTTP user agent to be something else that the site doesn't think you are using a robot.

http://www.google.com works



来源:https://stackoverflow.com/questions/6899273/python-urllib2-parse-html-problem

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!