Download file from web in Python 3

前端 未结 9 575
星月不相逢
星月不相逢 2020-11-22 16:43

I am creating a program that will download a .jar (java) file from a web server, by reading the URL that is specified in the .jad file of the same game/application. I\'m usi

相关标签:
9条回答
  • 2020-11-22 17:28

    I use requests package whenever I want something related to HTTP requests because its API is very easy to start with:

    first, install requests

    $ pip install requests
    

    then the code:

    from requests import get  # to make GET request
    
    
    def download(url, file_name):
        # open in binary mode
        with open(file_name, "wb") as file:
            # get request
            response = get(url)
            # write to file
            file.write(response.content)
    
    0 讨论(0)
  • 2020-11-22 17:28

    Yes, definietly requests is great package to use in something related to HTTP requests. but we need to be careful with the encoding type of the incoming data as well below is an example which explains the difference

    
    from requests import get
    
    # case when the response is byte array
    url = 'some_image_url'
    
    response = get(url)
    with open('output', 'wb') as file:
        file.write(response.content)
    
    
    # case when the response is text
    # Here unlikely if the reponse content is of type **iso-8859-1** we will have to override the response encoding
    url = 'some_page_url'
    
    response = get(url)
    # override encoding by real educated guess as provided by chardet
    r.encoding = r.apparent_encoding
    
    with open('output', 'w', encoding='utf-8') as file:
        file.write(response.content)
    
    
    0 讨论(0)
  • 2020-11-22 17:36

    If you want to obtain the contents of a web page into a variable, just read the response of urllib.request.urlopen:

    import urllib.request
    ...
    url = 'http://example.com/'
    response = urllib.request.urlopen(url)
    data = response.read()      # a `bytes` object
    text = data.decode('utf-8') # a `str`; this step can't be used if data is binary
    

    The easiest way to download and save a file is to use the urllib.request.urlretrieve function:

    import urllib.request
    ...
    # Download the file from `url` and save it locally under `file_name`:
    urllib.request.urlretrieve(url, file_name)
    
    import urllib.request
    ...
    # Download the file from `url`, save it in a temporary directory and get the
    # path to it (e.g. '/tmp/tmpb48zma.txt') in the `file_name` variable:
    file_name, headers = urllib.request.urlretrieve(url)
    

    But keep in mind that urlretrieve is considered legacy and might become deprecated (not sure why, though).

    So the most correct way to do this would be to use the urllib.request.urlopen function to return a file-like object that represents an HTTP response and copy it to a real file using shutil.copyfileobj.

    import urllib.request
    import shutil
    ...
    # Download the file from `url` and save it locally under `file_name`:
    with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
        shutil.copyfileobj(response, out_file)
    

    If this seems too complicated, you may want to go simpler and store the whole download in a bytes object and then write it to a file. But this works well only for small files.

    import urllib.request
    ...
    # Download the file from `url` and save it locally under `file_name`:
    with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
        data = response.read() # a `bytes` object
        out_file.write(data)
    

    It is possible to extract .gz (and maybe other formats) compressed data on the fly, but such an operation probably requires the HTTP server to support random access to the file.

    import urllib.request
    import gzip
    ...
    # Read the first 64 bytes of the file inside the .gz archive located at `url`
    url = 'http://example.com/something.gz'
    with urllib.request.urlopen(url) as response:
        with gzip.GzipFile(fileobj=response) as uncompressed:
            file_header = uncompressed.read(64) # a `bytes` object
            # Or do anything shown above using `uncompressed` instead of `response`.
    
    0 讨论(0)
提交回复
热议问题