How to perform time limited response download with python requests?

前端 未结 3 1841
灰色年华
灰色年华 2020-12-29 10:34

When downloading a large file with python, I want to put a time limit not only for the connection process, but also for the download.

I am trying with the following

相关标签:
3条回答
  • 2020-12-29 10:44

    Run download in a thread which you can then abort if not finished on time.

    import requests
    import threading
    
    URL='http://ipv4.download.thinkbroadband.com/1GB.zip'
    TIMEOUT=0.5
    
    def download(return_value):
        return_value.append(requests.get(URL))
    
    return_value = []
    download_thread = threading.Thread(target=download, args=(return_value,))
    download_thread.start()
    download_thread.join(TIMEOUT)
    
    if download_thread.is_alive():
        print 'The download was not finished on time...'
    else:
        print return_value[0].headers['content-length']
    
    0 讨论(0)
  • 2020-12-29 10:53

    When using Requests' prefetch=False parameter, you get to pull in arbitrary-sized chunks of the respone at a time (rather than all at once).

    What you'll need to do is tell Requests not to preload the entire request and keep your own time of how much you've spent reading so far, while fetching small chunks at a time. You can fetch a chunk using r.raw.read(CHUNK_SIZE). Overall, the code will look something like this:

    import requests
    import time
    
    CHUNK_SIZE = 2**12  # Bytes
    TIME_EXPIRE = time.time() + 5  # Seconds
    
    r = requests.get('http://ipv4.download.thinkbroadband.com/1GB.zip', prefetch=False)
    
    data = ''
    buffer = r.raw.read(CHUNK_SIZE)
    while buffer:
        data += buffer
        buffer = r.raw.read(CHUNK_SIZE)
    
        if TIME_EXPIRE < time.time():
            # Quit after 5 seconds.
            data += buffer
            break
    
    r.raw.release_conn()
    
    print "Read %s bytes out of %s expected." % (len(data), r.headers['content-length'])
    

    Note that this might sometimes use a bit more than the 5 seconds allotted as the final r.raw.read(...) could lag an arbitrary amount of time. But at least it doesn't depend on multithreading or socket timeouts.

    0 讨论(0)
  • 2020-12-29 11:01

    And the answer is: do not use requests, as it is blocking. Use non-blocking network I/O, for example eventlet:

    import eventlet
    from eventlet.green import urllib2
    from eventlet.timeout import Timeout
    
    url5 = 'http://ipv4.download.thinkbroadband.com/5MB.zip'
    url10 = 'http://ipv4.download.thinkbroadband.com/10MB.zip'
    
    urls = [url5, url5, url10, url10, url10, url5, url5]
    
    def fetch(url):
        response = bytearray()
        with Timeout(60, False):
            response = urllib2.urlopen(url).read()
        return url, len(response)
    
    pool = eventlet.GreenPool()
    for url, length in pool.imap(fetch, urls):
        if (not length):
            print "%s: timeout!" % (url)
        else:
            print "%s: %s" % (url, length)
    

    Produces expected results:

    http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880
    http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880
    http://ipv4.download.thinkbroadband.com/10MB.zip: timeout!
    http://ipv4.download.thinkbroadband.com/10MB.zip: timeout!
    http://ipv4.download.thinkbroadband.com/10MB.zip: timeout!
    http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880
    http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880
    
    0 讨论(0)
提交回复
热议问题