How to perform time limited response download with python requests?

前端未结

关注

 3  1842

When downloading a large file with python, I want to put a time limit not only for the connection process, but also for the download.

I am trying with the following

相关标签:

3条回答

花落未央

2020-12-29 10:44

Run download in a thread which you can then abort if not finished on time.

import requests
import threading

URL='http://ipv4.download.thinkbroadband.com/1GB.zip'
TIMEOUT=0.5

def download(return_value):
    return_value.append(requests.get(URL))

return_value = []
download_thread = threading.Thread(target=download, args=(return_value,))
download_thread.start()
download_thread.join(TIMEOUT)

if download_thread.is_alive():
    print 'The download was not finished on time...'
else:
    print return_value[0].headers['content-length']

0 讨论(0)

鱼传尺愫

2020-12-29 10:53
When using Requests' prefetch=False parameter, you get to pull in arbitrary-sized chunks of the respone at a time (rather than all at once).

What you'll need to do is tell Requests not to preload the entire request and keep your own time of how much you've spent reading so far, while fetching small chunks at a time. You can fetch a chunk using r.raw.read(CHUNK_SIZE). Overall, the code will look something like this:
```
import requests
import time

CHUNK_SIZE = 2**12  # Bytes
TIME_EXPIRE = time.time() + 5  # Seconds

r = requests.get('http://ipv4.download.thinkbroadband.com/1GB.zip', prefetch=False)

data = ''
buffer = r.raw.read(CHUNK_SIZE)
while buffer:
    data += buffer
    buffer = r.raw.read(CHUNK_SIZE)

    if TIME_EXPIRE < time.time():
        # Quit after 5 seconds.
        data += buffer
        break

r.raw.release_conn()

print "Read %s bytes out of %s expected." % (len(data), r.headers['content-length'])
```
Note that this might sometimes use a bit more than the 5 seconds allotted as the final r.raw.read(...) could lag an arbitrary amount of time. But at least it doesn't depend on multithreading or socket timeouts.
0 讨论(0)
发布评论:

提交评论
- 加载中...

梦谈多话

2020-12-29 11:01

And the answer is: do not use requests, as it is blocking. Use non-blocking network I/O, for example eventlet:

import eventlet
from eventlet.green import urllib2
from eventlet.timeout import Timeout

url5 = 'http://ipv4.download.thinkbroadband.com/5MB.zip'
url10 = 'http://ipv4.download.thinkbroadband.com/10MB.zip'

urls = [url5, url5, url10, url10, url10, url5, url5]

def fetch(url):
    response = bytearray()
    with Timeout(60, False):
        response = urllib2.urlopen(url).read()
    return url, len(response)

pool = eventlet.GreenPool()
for url, length in pool.imap(fetch, urls):
    if (not length):
        print "%s: timeout!" % (url)
    else:
        print "%s: %s" % (url, length)

Produces expected results:

http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880
http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880
http://ipv4.download.thinkbroadband.com/10MB.zip: timeout!
http://ipv4.download.thinkbroadband.com/10MB.zip: timeout!
http://ipv4.download.thinkbroadband.com/10MB.zip: timeout!
http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880
http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880

0 讨论(0)