How do I download a file over HTTP using Python?

前端 未结 25 3052
感动是毒
感动是毒 2020-11-21 07:17

I have a small utility that I use to download an MP3 file from a website on a schedule and then builds/updates a podcast XML file which I\'ve added to iTunes.

The te

25条回答
  •  梦毁少年i
    2020-11-21 07:40

    If speed matters to you, I made a small performance test for the modules urllib and wget, and regarding wget I tried once with status bar and once without. I took three different 500MB files to test with (different files- to eliminate the chance that there is some caching going on under the hood). Tested on debian machine, with python2.

    First, these are the results (they are similar in different runs):

    $ python wget_test.py 
    urlretrive_test : starting
    urlretrive_test : 6.56
    ==============
    wget_no_bar_test : starting
    wget_no_bar_test : 7.20
    ==============
    wget_with_bar_test : starting
    100% [......................................................................] 541335552 / 541335552
    wget_with_bar_test : 50.49
    ==============
    

    The way I performed the test is using "profile" decorator. This is the full code:

    import wget
    import urllib
    import time
    from functools import wraps
    
    def profile(func):
        @wraps(func)
        def inner(*args):
            print func.__name__, ": starting"
            start = time.time()
            ret = func(*args)
            end = time.time()
            print func.__name__, ": {:.2f}".format(end - start)
            return ret
        return inner
    
    url1 = 'http://host.com/500a.iso'
    url2 = 'http://host.com/500b.iso'
    url3 = 'http://host.com/500c.iso'
    
    def do_nothing(*args):
        pass
    
    @profile
    def urlretrive_test(url):
        return urllib.urlretrieve(url)
    
    @profile
    def wget_no_bar_test(url):
        return wget.download(url, out='/tmp/', bar=do_nothing)
    
    @profile
    def wget_with_bar_test(url):
        return wget.download(url, out='/tmp/')
    
    urlretrive_test(url1)
    print '=============='
    time.sleep(1)
    
    wget_no_bar_test(url2)
    print '=============='
    time.sleep(1)
    
    wget_with_bar_test(url3)
    print '=============='
    time.sleep(1)
    

    urllib seems to be the fastest

提交回复
热议问题