Timeout error when downloading .html files from urls

问题

I get the following an error when downloading html pages from the urls.

Error: raise URLError(err) urllib2.URLError: <urlopen error [Errno
 10060] A connection attempt failed because the connected party did not
 properly respond after a period of time or established connection
 failed because connected host has failed to respond>

Code:

import urllib2 
hdr = {'User-Agent': 'Mozilla/5.0'}

for i,site in enumerate(urls[index]):
    print (site)
    req = urllib2.Request(site, headers=hdr)
    page = urllib2.build_opener(urllib2.HTTPCookieProcessor).open(req)
    page_content = page.read()
    with open(path_current+'/'+str(i)+'.html', 'w') as fid:
        fid.write(page_content)

I think it may be due to some proxy settings or changing the timeout but I am not sure. Please help, I manually checked the urls seem to open perfectly fine.

回答1:

Well, since it doesn't happen to you most of the time, I can infer that your network is probably slow. Try to set the timeout in the following way:

req = urllib2.Request(site, headers=hdr)
timeout_in_sec = 360
page = urllib2.build_opener(urllib2.HTTPCookieProcessor).open(req, timeout=timeout_in_sec)
page_content = page.read()

来源：https://stackoverflow.com/questions/30373301/timeout-error-when-downloading-html-files-from-urls

标签

python

html

python-2.7

urllib2

urllib

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!