Download file from web in Python 3

前端未结

关注

 9  574

I am creating a program that will download a .jar (java) file from a web server, by reading the URL that is specified in the .jad file of the same game/application. I\'m usi

Motivation

Sometimes, we are want to get the picture but not need to download it to real files,

i.e., download the data and keep it on memory.

For example, If I use the machine learning method, train a model that can recognize an image with the number (bar code).

When I spider some websites and that have those images so I can use the model to recognize it,

and I don't want to save those pictures on my disk drive,

then you can try the below method to help you keep download data on memory.

Points

import requests
from io import BytesIO
response = requests.get(url)
with BytesIO as io_obj:
    for chunk in response.iter_content(chunk_size=4096):
        io_obj.write(chunk)

basically, is like to @Ranvijay Kumar

An Example

import requests
from typing import NewType, TypeVar
from io import StringIO, BytesIO
import matplotlib.pyplot as plt
import imageio

URL = NewType('URL', str)
T_IO = TypeVar('T_IO', StringIO, BytesIO)


def download_and_keep_on_memory(url: URL, headers=None, timeout=None, **option) -> T_IO:
    chunk_size = option.get('chunk_size', 4096)  # default 4KB
    max_size = 1024 ** 2 * option.get('max_size', -1)  # MB, default will ignore.
    response = requests.get(url, headers=headers, timeout=timeout)
    if response.status_code != 200:
        raise requests.ConnectionError(f'{response.status_code}')

    instance_io = StringIO if isinstance(next(response.iter_content(chunk_size=1)), str) else BytesIO
    io_obj = instance_io()
    cur_size = 0
    for chunk in response.iter_content(chunk_size=chunk_size):
        cur_size += chunk_size
        if 0 < max_size < cur_size:
            break
        io_obj.write(chunk)
    io_obj.seek(0)
    """ save it to real file.
    with open('temp.png', mode='wb') as out_f:
        out_f.write(io_obj.read())
    """
    return io_obj


def main():
    headers = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7',
        'Cache-Control': 'max-age=0',
        'Connection': 'keep-alive',
        'Host': 'statics.591.com.tw',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
    }
    io_img = download_and_keep_on_memory(URL('http://statics.591.com.tw/tools/showPhone.php?info_data=rLsGZe4U%2FbphHOimi2PT%2FhxTPqI&type=rLEFMu4XrrpgEw'),
                                         headers,  # You may need this. Otherwise, some websites will send the 404 error to you.
                                         max_size=4)  # max loading < 4MB
    with io_img:
        plt.rc('axes.spines', top=False, bottom=False, left=False, right=False)
        plt.rc(('xtick', 'ytick'), color=(1, 1, 1, 0))  # same of plt.axis('off')
        plt.imshow(imageio.imread(io_img, as_gray=False, pilmode="RGB"))
        plt.show()


if __name__ == '__main__':
    main()

0 讨论(0)

青春惊慌失措

2020-11-22 17:17
I hope I understood the question right, which is: how to download a file from a server when the URL is stored in a string type?

I download files and save it locally using the below code:
```
import requests

url = 'https://www.python.org/static/img/python-logo.png'
fileName = 'D:\Python\dwnldPythonLogo.png'
req = requests.get(url)
file = open(fileName, 'wb')
for chunk in req.iter_content(100000):
    file.write(chunk)
file.close()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
情书的邮戳

2020-11-22 17:24
You can use wget which is popular downloading shell tool for that. https://pypi.python.org/pypi/wget This will be the simplest method since it does not need to open up the destination file. Here is an example.
```
import wget
url = 'https://i1.wp.com/python3.codes/wp-content/uploads/2015/06/Python3-powered.png?fit=650%2C350'  
wget.download(url, '/Users/scott/Downloads/cat4.jpg') 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
不知归路

2020-11-22 17:27
Here we can use urllib's Legacy interface in Python3:

The following functions and classes are ported from the Python 2 module urllib (as opposed to urllib2). They might become deprecated at some point in the future.

Example (2 lines code):
```
import urllib.request

url = 'https://www.python.org/static/img/python-logo.png'
urllib.request.urlretrieve(url, "logo.png")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页