I\'m having a heck of a time getting asynchronous / threaded HTTPS requests to work using Python\'s urllib2.
Does anyone out there have a basic example that implements u
there's a really simple way, involving a handler for urllib2, which you can find here: http://pythonquirks.blogspot.co.uk/2009/12/asynchronous-http-request.html
#!/usr/bin/env python
import urllib2
import threading
class MyHandler(urllib2.HTTPHandler):
def http_response(self, req, response):
print "url: %s" % (response.geturl(),)
print "info: %s" % (response.info(),)
for l in response:
print l
return response
o = urllib2.build_opener(MyHandler())
t = threading.Thread(target=o.open, args=('http://www.google.com/',))
t.start()
print "I'm asynchronous!"
t.join()
print "I've ended!"
The code below does 7 http requests asynchronously at the same time. It does not use threads, instead it uses asynchronous networking with the twisted library.
from twisted.web import client
from twisted.internet import reactor, defer
urls = [
'http://www.python.org',
'http://stackoverflow.com',
'http://www.twistedmatrix.com',
'http://www.google.com',
'http://launchpad.net',
'http://github.com',
'http://bitbucket.org',
]
def finish(results):
for result in results:
print 'GOT PAGE', len(result), 'bytes'
reactor.stop()
waiting = [client.getPage(url) for url in urls]
defer.gatherResults(waiting).addCallback(finish)
reactor.run()
here is an example using urllib2 (with https) and threads. Each thread cycles through a list of URL's and retrieves the resource.
import itertools
import urllib2
from threading import Thread
THREADS = 2
URLS = (
'https://foo/bar',
'https://foo/baz',
)
def main():
for _ in range(THREADS):
t = Agent(URLS)
t.start()
class Agent(Thread):
def __init__(self, urls):
Thread.__init__(self)
self.urls = urls
def run(self):
urls = itertools.cycle(self.urls)
while True:
data = urllib2.urlopen(urls.next()).read()
if __name__ == '__main__':
main()
You can use asynchronous IO to do this.
requests + gevent = grequests
GRequests allows you to use Requests with Gevent to make asynchronous HTTP Requests easily.
import grequests
urls = [
'http://www.heroku.com',
'http://tablib.org',
'http://httpbin.org',
'http://python-requests.org',
'http://kennethreitz.com'
]
rs = (grequests.get(u) for u in urls)
grequests.map(rs)
here is the code from eventlet
urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
"https://wiki.secondlife.com/w/images/secondlife.jpg",
"http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]
import eventlet
from eventlet.green import urllib2
def fetch(url):
return urllib2.urlopen(url).read()
pool = eventlet.GreenPool()
for body in pool.imap(fetch, urls):
print "got body", len(body)