How to download a file using python in a 'smarter' way?

前端 未结 5 1575
盖世英雄少女心
盖世英雄少女心 2020-11-27 10:04

I need to download several files via http in Python.

The most obvious way to do it is just using urllib2:

import urllib2
u = urllib2.urlopen(\'http:/         


        
相关标签:
5条回答
  • 2020-11-27 10:23

    2 Kender:

    if localName[0] == '"' or localName[0] == "'":
        localName = localName[1:-1]
    

    it is not safe -- web server can pass wrong formatted name as ["file.ext] or [file.ext'] or even be empty and localName[0] will raise exception. Correct code can looks like this:

    localName = localName.replace('"', '').replace("'", "")
    if localName == '':
        localName = SOME_DEFAULT_FILE_NAME
    
    0 讨论(0)
  • 2020-11-27 10:34

    Using wget:

    custom_file_name = "/custom/path/custom_name.ext"
    wget.download(url, custom_file_name)
    

    Using urlretrieve:

    urllib.urlretrieve(url, custom_file_name)
    

    urlretrieve also creates the directory structure if not exists.

    0 讨论(0)
  • 2020-11-27 10:40

    Combining much of the above, here is a more pythonic solution:

    import urllib2
    import shutil
    import urlparse
    import os
    
    def download(url, fileName=None):
        def getFileName(url,openUrl):
            if 'Content-Disposition' in openUrl.info():
                # If the response has Content-Disposition, try to get filename from it
                cd = dict(map(
                    lambda x: x.strip().split('=') if '=' in x else (x.strip(),''),
                    openUrl.info()['Content-Disposition'].split(';')))
                if 'filename' in cd:
                    filename = cd['filename'].strip("\"'")
                    if filename: return filename
            # if no filename was found above, parse it out of the final URL.
            return os.path.basename(urlparse.urlsplit(openUrl.url)[2])
    
        r = urllib2.urlopen(urllib2.Request(url))
        try:
            fileName = fileName or getFileName(url,r)
            with open(fileName, 'wb') as f:
                shutil.copyfileobj(r,f)
        finally:
            r.close()
    
    0 讨论(0)
  • 2020-11-27 10:43

    Based on comments and @Oli's anwser, I made a solution like this:

    from os.path import basename
    from urlparse import urlsplit
    
    def url2name(url):
        return basename(urlsplit(url)[2])
    
    def download(url, localFileName = None):
        localName = url2name(url)
        req = urllib2.Request(url)
        r = urllib2.urlopen(req)
        if r.info().has_key('Content-Disposition'):
            # If the response has Content-Disposition, we take file name from it
            localName = r.info()['Content-Disposition'].split('filename=')[1]
            if localName[0] == '"' or localName[0] == "'":
                localName = localName[1:-1]
        elif r.url != url: 
            # if we were redirected, the real file name we take from the final URL
            localName = url2name(r.url)
        if localFileName: 
            # we can force to save the file as specified name
            localName = localFileName
        f = open(localName, 'wb')
        f.write(r.read())
        f.close()
    

    It takes file name from Content-Disposition; if it's not present, uses filename from the URL (if redirection happened, the final URL is taken into account).

    0 讨论(0)
  • 2020-11-27 10:44

    Download scripts like that tend to push a header telling the user-agent what to name the file:

    Content-Disposition: attachment; filename="the filename.ext"
    

    If you can grab that header, you can get the proper filename.

    There's another thread that has a little bit of code to offer up for Content-Disposition-grabbing.

    remotefile = urllib2.urlopen('http://example.com/somefile.zip')
    remotefile.info()['Content-Disposition']
    
    0 讨论(0)
提交回复
热议问题