Why does this url raise BadStatusLine with httplib2 and urllib2?

懵懂的女人 提交于 2019-11-29 10:12:34

This works fine for me:

import urllib2

opener = urllib2.build_opener()

headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 5.1; rv:10.0.1) Gecko/20100101 Firefox/10.0.1',
}

opener.addheaders = headers.items()
response = opener.open("http://www.zdnet.co.kr/news/news_print.asp?artice_id=20110727092902")

print response.headers
print response.read()

The website discards all requests that occur without a User-Agent string.

For all the people that end up here with a similar problem after installing httplib2 0.8:

Version 0.8 has a regression with connection handling in relation with HTTP keep-alive. See the bug report: https://code.google.com/p/httplib2/issues/detail?id=250

There is a fix for this issue, but it has not been released so far. Until then just use httplib2 0.7.7.

In my code,when i use

    from urllib2 import urlopen  
    content = urlopen(page).read()

the exception appears. However, when i use

    import urllib  
    content = urllib.urlopen(page).read()

everything is ok. Maybe it will help u.

Look like this webpage doesn't allow your user agent. You can change it like this:

>>> import urllib2
>>> user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
>>> headers = { 'User-Agent' : user_agent }
>>> r = urllib2.Request('http://www.zdnet.co.kr/news/news_print.asp?artice_id=20110727092902', headers=headers)
>>> fd = urllib2.urlopen(r)
>>> print fd[20:]
'<!DOCTYPE html PUBLI'
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!