I\'m looking for information on thread safety of urllib2
and httplib
.
The official documentation (http://docs.python.org/library/urllib2.html and h
httplib
and urllib2
are not thread-safe.
urllib2
does not provide serialized access to a global (shared)
OpenerDirector
object, which is used by urllib2.urlopen()
.
Similarly, httplib
does not provide serialized access to HTTPConnection
objects (i.e. by using a thread-safe connection pool), so sharing HTTPConnection
objects between threads is not safe.
I suggest using httplib2 or urllib3 as an alternative if thread-safety is required.
Generally, if a module's documentation does not mention thread-safety, I would assume it is not thread-safe. You can look at the module's source code for verification.
When browsing the source code to determine whether a module is thread-safe, you
can start by looking for uses of thread synchronization primitives from the
threading
or multiprocessing
modules, or use of queue.Queue
.
UPDATE
Here is a relevant source code snippet from urllib2.py
(Python 2.7.2):
_opener = None
def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
global _opener
if _opener is None:
_opener = build_opener()
return _opener.open(url, data, timeout)
def install_opener(opener):
global _opener
_opener = opener
There is an obvious race condition when concurrent threads call install_opener()
and urlopen()
.
Also, note that calling urlopen()
with a Request
object as the url
parameter may mutate the Request
object (see the source for OpenerDirector.open()), so it is not safe to concurrently call urlopen()
with a shared Request
object.
All told, urlopen()
is thread-safe if the following conditions are met:
install_opener()
is not called from another thread.Request
object, or string is used as the url
parameter.