multithreaded file download in python and updating in shell with download progress

给你一囗甜甜゛ 提交于 2020-01-14 04:21:08

问题


in an attempt to learn multithreaded file download I wrote this piece of cake:

import urllib2
import os
import sys
import time
import threading

urls = ["http://broadcast.lds.org/churchmusic/MP3/1/2/nowords/271.mp3",
"http://s1.fans.ge/mp3/201109/08/John_Legend_So_High_Remix(fans_ge).mp3",
"http://megaboon.com/common/preview/track/786203.mp3"]

url = urls[1]

def downloadFile(url, saveTo=None):
    file_name = url.split('/')[-1]
    if not saveTo:
        saveTo = '/Users/userName/Desktop'
    try:
        u = urllib2.urlopen(url)
    except urllib2.URLError , er:
        print("%s" % er.reason)
    else:

        f = open(os.path.join(saveTo, file_name), 'wb')
        meta = u.info()
        file_size = int(meta.getheaders("Content-Length")[0])
        print "Downloading: %s Bytes: %s" % (file_name, file_size)
        file_size_dl = 0
        block_sz = 8192
        while True:
            buffer = u.read(block_sz)
            if not buffer:
                break

            file_size_dl += len(buffer)
            f.write(buffer)
            status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
            status = status + chr(8)*(len(status)+1)
            sys.stdout.write('%s\r' % status)
            time.sleep(.2)
            sys.stdout.flush()
            if file_size_dl == file_size:
                print r"Download Completed %s%% for file %s, saved to %s" % (file_size_dl * 100. / file_size, file_name, saveTo,)
        f.close()
        return


def synchronusDownload():
    urls_saveTo = {urls[0]: None, urls[1]: None, urls[2]: None}
    for url, saveTo in urls_saveTo.iteritems():
        th = threading.Thread(target=downloadFile, args=(url, saveTo), name="%s_Download_Thread" % os.path.basename(url))
        th.start()

synchronusDownload()

but it seems like for the initiation of the second download it waits for the first thread and then goes to download the next file, as printed in shell too.

my plan was to begin all downloads simultaneously and print the updated progress of the files getting downloaded.

Any help will be greatly appreciated. thanks.


回答1:


This is a common problem and here are the steps typically taken:

1.) use Queue.Queue to create a queue of all the urls you would like to visit.

2.) Create a class that inherits from threading.Thread. It should have a run method that grabs a url from the queue and gets the data.

3.) Create a pool of threads based on your class to be "workers"

4.) Don't exit the program until queue.join() has been completed




回答2:


Your functions are actually running in parallel. You can verify this by printing at the start of each function - 3 outputs will be printed as soon as your program is started.

What's happening is your first two files are so small that they are completely downloaded before the scheduler switches threads. Try setting bigger files in your list:

urls = [
"http://www.wswd.net/testdownloadfiles/50MB.zip",
"http://www.wswd.net/testdownloadfiles/20MB.zip",
"http://www.wswd.net/testdownloadfiles/100MB.zip",
]

Program output:

Downloading: 100MB.zip Bytes: 104857600
Downloading: 20MB.zip Bytes: 20971520
Downloading: 50MB.zip Bytes: 52428800
Download Completed 100.0% for file 20MB.zip, saved to .
Download Completed 100.0% for file 50MB.zip, saved to .
Download Completed 100.0% for file 100MB.zip, saved to .


来源:https://stackoverflow.com/questions/24216760/multithreaded-file-download-in-python-and-updating-in-shell-with-download-progre

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!