Parallel fetching of files

前端未结

关注

 3  533

故里飘歌

In order to download files, I\'m creating a urlopen object (urllib2 class) and reading it in chunks.

I would like to connect to the server several times and download

相关标签:

3条回答

旧巷少年郎

2020-12-04 16:28

Sounds like you want to use one of the flavors of HTTP Range that are available.

edit Updated link to point to the w3.org stored RFC

0 讨论(0)
发布评论:

提交评论
- 加载中...
北恋

2020-12-04 16:31
As to running parallel requests you might want to use urllib3 or requests.

I took some time to make a list of similar questions:

Looking for [python] +download +concurrent gives these interesting ones:
- Concurrent downloads - Python
- What is the fastest way to send 100,000 HTTP requests in Python?
- Library or tool to download multiple files in parallell
- Download multiple pages concurrently?
- Python: simple async download of url content?
- Python, gevent, urllib2.urlopen.read(), download accelerator
- Python/Urllib2/Threading: Single download thread faster than multiple download threads. Why?
- Scraping landing pages of a list of domains
- A clean, lightweight alternative to Python's twisted?
Looking for [python] +http +concurrent gives these:
- Python: How to make multiple HTTP POST queries in one moment?
- Multi threaded web scraper using urlretrieve on a cookie-enabled site
Looking for [python] +urllib2 +slow:
- Python urllib2.open is slow, need a better way to read several urls
- Python 2.6: parallel parsing with urllib2
- How can I speed up fetching pages with urllib2 in python?
- Threading HTTP requests (with proxies)
Looking for [python] +download +many:
- Python,multi-threads,fetch webpages,download webpages
- Downloading files in twisted using queue
- Python: Something like map that works on threads
- Rotating Proxies for web scraping
- Anyone know of a good Python based web crawler that I could use?
0 讨论(0)
发布评论:

提交评论
- 加载中...
野的像风

2020-12-04 16:36

As we've been talking already I made such one using PycURL.

The one, and only one, thing I had to do was pycurl_instance.setopt(pycurl_instance.NOSIGNAL, 1) to prevent crashes.

I did use APScheduler to fire requests in the separate threads. Thanks to your advices of changing busy waiting while True: pass to while True: time.sleep(3) in the main thread the code behaves quite nice and usage of Runner module from python-daemon package application is almost ready to be used as a typical UN*X daemon.

0 讨论(0)
发布评论:

提交评论
- 加载中...