Python equivalent of a given wget command

前端 未结 10 2170
面向向阳花
面向向阳花 2020-12-02 10:04

I\'m trying to create a Python function that does the same thing as this wget command:

wget -c --read-timeout=5 --tries=0 \"$URL\"

-c

相关标签:
10条回答
  • 2020-12-02 10:43

    Here's the code adopted from the torchvision library:

    import urllib
    
    def download_url(url, root, filename=None):
        """Download a file from a url and place it in root.
        Args:
            url (str): URL to download file from
            root (str): Directory to place downloaded file in
            filename (str, optional): Name to save the file under. If None, use the basename of the URL
        """
    
        root = os.path.expanduser(root)
        if not filename:
            filename = os.path.basename(url)
        fpath = os.path.join(root, filename)
    
        os.makedirs(root, exist_ok=True)
    
        try:
            print('Downloading ' + url + ' to ' + fpath)
            urllib.request.urlretrieve(url, fpath)
        except (urllib.error.URLError, IOError) as e:
            if url[:5] == 'https':
                url = url.replace('https:', 'http:')
                print('Failed download. Trying https -> http instead.'
                        ' Downloading ' + url + ' to ' + fpath)
                urllib.request.urlretrieve(url, fpath)
    

    If you are ok to take dependency on torchvision library then you also also simply do:

    from torchvision.datasets.utils import download_url
    download_url('http://something.com/file.zip', '~/my_folder`)
    
    0 讨论(0)
  • 2020-12-02 10:47

    A solution that I often find simpler and more robust is to simply execute a terminal command within python. In your case:

    import os
    url = 'https://www.someurl.com'
    os.system(f"""wget -c --read-timeout=5 --tries=0 "{url}"""")
    
    0 讨论(0)
  • 2020-12-02 10:48

    TensorFlow makes life easier. file path gives us the location of downloaded file.

    import tensorflow as tf
    tf.keras.utils.get_file(origin='https://storage.googleapis.com/tf-datasets/titanic/train.csv',
                                        fname='train.csv',
                                        untar=False, extract=False)
    
    0 讨论(0)
  • 2020-12-02 10:55

    There is also a nice Python module named wget that is pretty easy to use. Found here.

    This demonstrates the simplicity of the design:

    >>> import wget
    >>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
    >>> filename = wget.download(url)
    100% [................................................] 3841532 / 3841532>
    >> filename
    'razorback.mp3'
    

    Enjoy.

    However, if wget doesn't work (I've had trouble with certain PDF files), try this solution.

    Edit: You can also use the out parameter to use a custom output directory instead of current working directory.

    >>> output_directory = <directory_name>
    >>> filename = wget.download(url, out=output_directory)
    >>> filename
    'razorback.mp3'
    
    0 讨论(0)
  • 2020-12-02 10:56

    I had to do something like this on a version of linux that didn't have the right options compiled into wget. This example is for downloading the memory analysis tool 'guppy'. I'm not sure if it's important or not, but I kept the target file's name the same as the url target name...

    Here's what I came up with:

    python -c "import requests; r = requests.get('https://pypi.python.org/packages/source/g/guppy/guppy-0.1.10.tar.gz') ; open('guppy-0.1.10.tar.gz' , 'wb').write(r.content)"
    

    That's the one-liner, here's it a little more readable:

    import requests
    fname = 'guppy-0.1.10.tar.gz'
    url = 'https://pypi.python.org/packages/source/g/guppy/' + fname
    r = requests.get(url)
    open(fname , 'wb').write(r.content)
    

    This worked for downloading a tarball. I was able to extract the package and download it after downloading.

    EDIT:

    To address a question, here is an implementation with a progress bar printed to STDOUT. There is probably a more portable way to do this without the clint package, but this was tested on my machine and works fine:

    #!/usr/bin/env python
    
    from clint.textui import progress
    import requests
    
    fname = 'guppy-0.1.10.tar.gz'
    url = 'https://pypi.python.org/packages/source/g/guppy/' + fname
    
    r = requests.get(url, stream=True)
    with open(fname, 'wb') as f:
        total_length = int(r.headers.get('content-length'))
        for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length/1024) + 1): 
            if chunk:
                f.write(chunk)
                f.flush()
    
    0 讨论(0)
  • 2020-12-02 10:59
    import urllib2
    import time
    
    max_attempts = 80
    attempts = 0
    sleeptime = 10 #in seconds, no reason to continuously try if network is down
    
    #while true: #Possibly Dangerous
    while attempts < max_attempts:
        time.sleep(sleeptime)
        try:
            response = urllib2.urlopen("http://example.com", timeout = 5)
            content = response.read()
            f = open( "local/index.html", 'w' )
            f.write( content )
            f.close()
            break
        except urllib2.URLError as e:
            attempts += 1
            print type(e)
    
    0 讨论(0)
提交回复
热议问题