I have code for reading an url like this:
from urllib2 import Request, urlopen
req = Request(url)
for key, val in headers.items():
req.add_header(key, va
This isn't the behavior I see. I get a URLError
when the call times out:
from urllib2 import Request, urlopen
req = Request('http://www.google.com')
res = urlopen(req,timeout=0.000001)
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# ...
# raise URLError(err)
# urllib2.URLError: <urlopen error timed out>
Can't you catch this error and then avoid trying to read res
?
When I try to use res.read()
after this I get NameError: name 'res' is not defined.
Is something like this what you need:
try:
res = urlopen(req,timeout=3.0)
except:
print 'Doh!'
finally:
print 'yay!'
print res.read()
I suppose the way to implement a timeout manually is via multiprocessing
, no? If the job hasn't finished you can terminate it.
I found in my tests (using the technique described here) that a timeout set in the urlopen()
call also effects the read()
call:
import urllib2 as u
c = u.urlopen('http://localhost/', timeout=5.0)
s = c.read(1<<20)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/usr/lib/python2.7/httplib.py", line 561, in read
s = self.fp.read(amt)
File "/usr/lib/python2.7/httplib.py", line 1298, in read
return s + self._file.read(amt - len(s))
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
socket.timeout: timed out
Maybe it's a feature of newer versions? I'm using Python 2.7 on a 12.04 Ubuntu straight out of the box.
One possible (imperfect) solution is to set the global socket timeout, explained in more detail here:
import socket
import urllib2
# timeout in seconds
socket.setdefaulttimeout(10)
# this call to urllib2.urlopen now uses the default timeout
# we have set in the socket module
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)
However, this only works if you're willing to globally modify the timeout for all users of the socket module. I'm running the request from within a Celery task, so doing this would mess up timeouts for the Celery worker code itself.
I'd be happy to hear any other solutions...
Had the same issue with socket timeout on the read statement. What worked for me was putting both the urlopen and the read inside a try statement. Hope this helps!
I'd expect this to be a common problem, and yet - no answers to be found anywhere... Just built a solution for this using timeout signal:
import urllib2
import socket
timeout = 10
socket.setdefaulttimeout(timeout)
import time
import signal
def timeout_catcher(signum, _):
raise urllib2.URLError("Read timeout")
signal.signal(signal.SIGALRM, timeout_catcher)
def safe_read(url, timeout_time):
signal.setitimer(signal.ITIMER_REAL, timeout_time)
url = 'http://uberdns.eu'
content = urllib2.urlopen(url, timeout=timeout_time).read()
signal.setitimer(signal.ITIMER_REAL, 0)
# you should also catch any exceptions going out of urlopen here,
# set the timer to 0, and pass the exceptions on.
The credit for the signal part of the solution goes here btw: python timer mystery
It's not possible for any library to do this without using some kind of asynchronous timer through threads or otherwise. The reason is that the timeout
parameter used in httplib
, urllib2
and other libraries sets the timeout
on the underlying socket
. And what this actually does is explained in the documentation.
SO_RCVTIMEO
Sets the timeout value that specifies the maximum amount of time an input function waits until it completes. It accepts a timeval structure with the number of seconds and microseconds specifying the limit on how long to wait for an input operation to complete. If a receive operation has blocked for this much time without receiving additional data, it shall return with a partial count or errno set to [EAGAIN] or [EWOULDBLOCK] if no data is received.
The bolded part is key. A socket.timeout
is only raised if not a single byte has been received for the duration of the timeout
window. In other words, this is a timeout
between received bytes.
A simple function using threading.Timer
could be as follows.
import httplib
import socket
import threading
def download(host, path, timeout = 10):
content = None
http = httplib.HTTPConnection(host)
http.request('GET', path)
response = http.getresponse()
timer = threading.Timer(timeout, http.sock.shutdown, [socket.SHUT_RD])
timer.start()
try:
content = response.read()
except httplib.IncompleteRead:
pass
timer.cancel() # cancel on triggered Timer is safe
http.close()
return content
>>> host = 'releases.ubuntu.com'
>>> content = download(host, '/15.04/ubuntu-15.04-desktop-amd64.iso', 1)
>>> print content is None
True
>>> content = download(host, '/15.04/MD5SUMS', 1)
>>> print content is None
False
Other than checking for None
, it's also possible to catch the httplib.IncompleteRead
exception not inside the function, but outside of it. The latter case will not work though if the HTTP request doesn't have a Content-Length
header.