问题
I have a simple website I'm testing. It's running on localhost and I can access it in my web browser. The index page is simply the word "running". urllib.urlopen
will successfully read the page but urllib2.urlopen
will not. Here's a script which demonstrates the problem (this is the actual script and not a simplification of a different test script):
import urllib, urllib2
print urllib.urlopen("http://127.0.0.1").read() # prints "running"
print urllib2.urlopen("http://127.0.0.1").read() # throws an exception
Here's the stack trace:
Traceback (most recent call last):
File "urltest.py", line 5, in <module>
print urllib2.urlopen("http://127.0.0.1").read()
File "C:\Python25\lib\urllib2.py", line 121, in urlopen
return _opener.open(url, data)
File "C:\Python25\lib\urllib2.py", line 380, in open
response = meth(req, response)
File "C:\Python25\lib\urllib2.py", line 491, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python25\lib\urllib2.py", line 412, in error
result = self._call_chain(*args)
File "C:\Python25\lib\urllib2.py", line 353, in _call_chain
result = func(*args)
File "C:\Python25\lib\urllib2.py", line 575, in http_error_302
return self.parent.open(new)
File "C:\Python25\lib\urllib2.py", line 380, in open
response = meth(req, response)
File "C:\Python25\lib\urllib2.py", line 491, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python25\lib\urllib2.py", line 418, in error
return self._call_chain(*args)
File "C:\Python25\lib\urllib2.py", line 353, in _call_chain
result = func(*args)
File "C:\Python25\lib\urllib2.py", line 499, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 504: Gateway Timeout
Any ideas? I might end up needing some of the more advanced features of urllib2
, so I don't want to just resort to using urllib
, plus I want to understand this problem.
回答1:
Sounds like you have proxy settings defined that urllib2 is picking up on. When it tries to proxy "127.0.0.01/", the proxy gives up and returns a 504 error.
From Obscure python urllib2 proxy gotcha:
proxy_support = urllib2.ProxyHandler({})
opener = urllib2.build_opener(proxy_support)
print opener.open("http://127.0.0.1").read()
# Optional - makes this opener default for urlopen etc.
urllib2.install_opener(opener)
print urllib2.urlopen("http://127.0.0.1").read()
回答2:
Does calling urlib2.open first followed by urllib.open have the same results? Just wondering if the first call to open is causing the http server to get busy causing the timeout?
回答3:
I don't know what's going on, but you may find this helpful in figuring it out:
>>> import urllib2
>>> urllib2.urlopen('http://mit.edu').read()[:10]
'<!DOCTYPE '
>>> urllib2._opener.handlers[1].set_http_debuglevel(100)
>>> urllib2.urlopen('http://mit.edu').read()[:10]
connect: (mit.edu, 80)
send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: mit.edu\r\nConnection: close\r\nUser-Agent: Python-urllib/2.5\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 14 Oct 2008 15:52:03 GMT
header: Server: MIT Web Server Apache/1.3.26 Mark/1.5 (Unix) mod_ssl/2.8.9 OpenSSL/0.9.7c
header: Last-Modified: Tue, 14 Oct 2008 04:02:15 GMT
header: ETag: "71d3f96-2895-48f419c7"
header: Accept-Ranges: bytes
header: Content-Length: 10389
header: Connection: close
header: Content-Type: text/html
'<!DOCTYPE '
回答4:
urllib.urlopen() throws the following request at the server:
GET / HTTP/1.0
Host: 127.0.0.1
User-Agent: Python-urllib/1.17
while urllib2.urlopen() throws this:
GET / HTTP/1.1
Accept-Encoding: identity
Host: 127.0.0.1
Connection: close
User-Agent: Python-urllib/2.5
So, your server either doesn't understand HTTP/1.1 or the extra header fields.
来源:https://stackoverflow.com/questions/201515/urllib-urlopen-works-but-urllib2-urlopen-doesnt