Parallel fetching of files

前端 未结 3 533
故里飘歌
故里飘歌 2020-12-04 16:05

In order to download files, I\'m creating a urlopen object (urllib2 class) and reading it in chunks.

I would like to connect to the server several times and download

相关标签:
3条回答
  • 2020-12-04 16:28

    Sounds like you want to use one of the flavors of HTTP Range that are available.

    edit Updated link to point to the w3.org stored RFC

    0 讨论(0)
  • 2020-12-04 16:31

    As to running parallel requests you might want to use urllib3 or requests.

    I took some time to make a list of similar questions:

    Looking for [python] +download +concurrent gives these interesting ones:

    • Concurrent downloads - Python
    • What is the fastest way to send 100,000 HTTP requests in Python?
    • Library or tool to download multiple files in parallell
    • Download multiple pages concurrently?
    • Python: simple async download of url content?
    • Python, gevent, urllib2.urlopen.read(), download accelerator
    • Python/Urllib2/Threading: Single download thread faster than multiple download threads. Why?
    • Scraping landing pages of a list of domains
    • A clean, lightweight alternative to Python's twisted?

    Looking for [python] +http +concurrent gives these:

    • Python: How to make multiple HTTP POST queries in one moment?
    • Multi threaded web scraper using urlretrieve on a cookie-enabled site

    Looking for [python] +urllib2 +slow:

    • Python urllib2.open is slow, need a better way to read several urls
    • Python 2.6: parallel parsing with urllib2
    • How can I speed up fetching pages with urllib2 in python?
    • Threading HTTP requests (with proxies)

    Looking for [python] +download +many:

    • Python,multi-threads,fetch webpages,download webpages
    • Downloading files in twisted using queue
    • Python: Something like map that works on threads
    • Rotating Proxies for web scraping
    • Anyone know of a good Python based web crawler that I could use?
    0 讨论(0)
  • 2020-12-04 16:36

    As we've been talking already I made such one using PycURL.

    The one, and only one, thing I had to do was pycurl_instance.setopt(pycurl_instance.NOSIGNAL, 1) to prevent crashes.

    I did use APScheduler to fire requests in the separate threads. Thanks to your advices of changing busy waiting while True: pass to while True: time.sleep(3) in the main thread the code behaves quite nice and usage of Runner module from python-daemon package application is almost ready to be used as a typical UN*X daemon.

    0 讨论(0)
提交回复
热议问题