Multiple (asynchronous) connections with urllib2 or other http library?

前端 未结 6 1004
耶瑟儿~
耶瑟儿~ 2020-11-29 04:43

I have code like this.

for p in range(1,1000):
    result = False
    while result is False:
        ret = urllib2.Request(\'http://server/?\'+str(p))
               


        
相关标签:
6条回答
  • 2020-11-29 05:16

    maybe using multiprocessing and divide you work on 2 process or so .

    Here is an example (it's not tested)

    import multiprocessing
    import Queue
    import urllib2
    
    
    NUM_PROCESS = 2
    NUM_URL = 1000
    
    
    class DownloadProcess(multiprocessing.Process):
        """Download Process """
    
        def __init__(self, urls_queue, result_queue):
    
            multiprocessing.Process.__init__(self)
    
            self.urls = urls_queue
            self.result = result_queue
    
        def run(self):
            while True:
    
                 try:
                     url = self.urls.get_nowait()
                 except Queue.Empty:
                     break
    
                 ret = urllib2.Request(url)
                 res = urllib2.urlopen(ret)
    
                 try:
                     result = res.read()
                 except (urllib2.HTTPError, urllib2.URLError):
                         pass
    
                 self.result.put(result)
    
    
    def main():
    
        main_url = 'http://server/?%s'
    
        urls_queue = multiprocessing.Queue()
        for p in range(1, NUM_URL):
            urls_queue.put(main_url % p)
    
        result_queue = multiprocessing.Queue()
    
        for i in range(NUM_PROCESS):
            download = DownloadProcess(urls_queue, result_queue)
            download.start()
    
        results = []
        while result_queue:
            result = result_queue.get()
            results.append(result)
    
        return results
    
    if __name__ == "__main__":
        results = main()
    
        for res in results:
            print(res)
    
    0 讨论(0)
  • 2020-11-29 05:18

    You can use asynchronous IO to do this.

    requests + gevent = grequests

    GRequests allows you to use Requests with Gevent to make asynchronous HTTP Requests easily.

    import grequests
    
    urls = [
        'http://www.heroku.com',
        'http://tablib.org',
        'http://httpbin.org',
        'http://python-requests.org',
        'http://kennethreitz.com'
    ]
    
    rs = (grequests.get(u) for u in urls)
    grequests.map(rs)
    
    0 讨论(0)
  • 2020-11-29 05:20

    I know this question is a little old, but I thought it might be useful to promote another async solution built on the requests library.

    list_of_requests = ['http://moop.com', 'http://doop.com', ...]
    
    from simple_requests import Requests
    for response in Requests().swarm(list_of_requests):
        print response.content
    

    The docs are here: http://pythonhosted.org/simple-requests/

    0 讨论(0)
  • 2020-11-29 05:20

    Either you figure out threads, or you use Twisted (which is asynchronous).

    0 讨论(0)
  • 2020-11-29 05:21

    So, it's 2016

    0 讨论(0)
  • 2020-11-29 05:30

    Take a look at gevent — a coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of libevent event loop.

    Example:

    #!/usr/bin/python
    # Copyright (c) 2009 Denis Bilenko. See LICENSE for details.
    
    """Spawn multiple workers and wait for them to complete"""
    
    urls = ['http://www.google.com', 'http://www.yandex.ru', 'http://www.python.org']
    
    import gevent
    from gevent import monkey
    
    # patches stdlib (including socket and ssl modules) to cooperate with other greenlets
    monkey.patch_all()
    
    import urllib2
    
    
    def print_head(url):
        print 'Starting %s' % url
        data = urllib2.urlopen(url).read()
        print '%s: %s bytes: %r' % (url, len(data), data[:50])
    
    jobs = [gevent.spawn(print_head, url) for url in urls]
    
    gevent.joinall(jobs)
    
    0 讨论(0)
提交回复
热议问题