问题
I am trying to fetch a page and urlopen hangs and never returns anything, although the web page is very light and can be opened with any browser without any problems
import urllib.request
with urllib.request.urlopen("http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm") as response:
print(response.read())
This simple code just freezes while retrieving the response, but if you try to open http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm it opens without any problem
回答1:
www.planalto.gov.br is using user-agent detection. If you specify a valid user-agent, the request fulfills correctly. The urllib library didn't crash, it's just waiting.
curl -H "User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm
worked just fine for me but
curl http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm
did not.
Like RPGillespie said above, use urllib2 or requests to add the user-agent header (see How do I set headers using python's urllib? for more information about that).
来源:https://stackoverflow.com/questions/43987450/python-urllib-freezes-with-specific-url