What is the fastest way to send 100,000 HTTP requests in Python?

前端 未结 16 966
暖寄归人
暖寄归人 2020-11-22 07:12

I am opening a file which has 100,000 URL\'s. I need to send an HTTP request to each URL and print the status code. I am using Python 2.6, and so far looked at the many con

16条回答
  •  旧时难觅i
    2020-11-22 07:42

    Twistedless solution:

    from urlparse import urlparse
    from threading import Thread
    import httplib, sys
    from Queue import Queue
    
    concurrent = 200
    
    def doWork():
        while True:
            url = q.get()
            status, url = getStatus(url)
            doSomethingWithResult(status, url)
            q.task_done()
    
    def getStatus(ourl):
        try:
            url = urlparse(ourl)
            conn = httplib.HTTPConnection(url.netloc)   
            conn.request("HEAD", url.path)
            res = conn.getresponse()
            return res.status, ourl
        except:
            return "error", ourl
    
    def doSomethingWithResult(status, url):
        print status, url
    
    q = Queue(concurrent * 2)
    for i in range(concurrent):
        t = Thread(target=doWork)
        t.daemon = True
        t.start()
    try:
        for url in open('urllist.txt'):
            q.put(url.strip())
        q.join()
    except KeyboardInterrupt:
        sys.exit(1)
    

    This one is slighty faster than the twisted solution and uses less CPU.

提交回复
热议问题