How do I download a file over HTTP using Python?

前端未结

关注

 25  3012

I have a small utility that I use to download an MP3 file from a website on a schedule and then builds/updates a podcast XML file which I\'ve added to iTunes.

The te

相关标签:

25条回答

陌清茗

2020-11-21 07:40
In 2012, use the python requests library
```
>>> import requests
>>> 
>>> url = "http://download.thinkbroadband.com/10MB.zip"
>>> r = requests.get(url)
>>> print len(r.content)
10485760
```
You can run pip install requests to get it.

Requests has many advantages over the alternatives because the API is much simpler. This is especially true if you have to do authentication. urllib and urllib2 are pretty unintuitive and painful in this case.

2015-12-30

People have expressed admiration for the progress bar. It's cool, sure. There are several off-the-shelf solutions now, including tqdm:
```
from tqdm import tqdm
import requests

url = "http://download.thinkbroadband.com/10MB.zip"
response = requests.get(url, stream=True)

with open("10MB", "wb") as handle:
    for data in tqdm(response.iter_content()):
        handle.write(data)
```
This is essentially the implementation @kvance described 30 months ago.
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人及你

2020-11-21 07:40
Following are the most commonly used calls for downloading files in python:
1. urllib.urlretrieve ('url_to_file', file_name)
2. urllib2.urlopen('url_to_file')
3. requests.get(url)
4. wget.download('url', file_name)
Note: urlopen and urlretrieve are found to perform relatively bad with downloading large files (size > 500 MB). requests.get stores the file in-memory until download is complete.
0 讨论(0)
发布评论:

提交评论
- 加载中...

梦毁少年i

2020-11-21 07:40

If speed matters to you, I made a small performance test for the modules urllib and wget, and regarding wget I tried once with status bar and once without. I took three different 500MB files to test with (different files- to eliminate the chance that there is some caching going on under the hood). Tested on debian machine, with python2.

First, these are the results (they are similar in different runs):

$ python wget_test.py 
urlretrive_test : starting
urlretrive_test : 6.56
==============
wget_no_bar_test : starting
wget_no_bar_test : 7.20
==============
wget_with_bar_test : starting
100% [......................................................................] 541335552 / 541335552
wget_with_bar_test : 50.49
==============

The way I performed the test is using "profile" decorator. This is the full code:

import wget
import urllib
import time
from functools import wraps

def profile(func):
    @wraps(func)
    def inner(*args):
        print func.__name__, ": starting"
        start = time.time()
        ret = func(*args)
        end = time.time()
        print func.__name__, ": {:.2f}".format(end - start)
        return ret
    return inner

url1 = 'http://host.com/500a.iso'
url2 = 'http://host.com/500b.iso'
url3 = 'http://host.com/500c.iso'

def do_nothing(*args):
    pass

@profile
def urlretrive_test(url):
    return urllib.urlretrieve(url)

@profile
def wget_no_bar_test(url):
    return wget.download(url, out='/tmp/', bar=do_nothing)

@profile
def wget_with_bar_test(url):
    return wget.download(url, out='/tmp/')

urlretrive_test(url1)
print '=============='
time.sleep(1)

wget_no_bar_test(url2)
print '=============='
time.sleep(1)

wget_with_bar_test(url3)
print '=============='
time.sleep(1)

urllib seems to be the fastest

0 讨论(0)

感情败类

2020-11-21 07:41
use wget module:
```
import wget
wget.download('url')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

一整个雨季

2020-11-21 07:41

You can use PycURL on Python 2 and 3.

import pycurl

FILE_DEST = 'pycurl.html'
FILE_SRC = 'http://pycurl.io/'

with open(FILE_DEST, 'wb') as f:
    c = pycurl.Curl()
    c.setopt(c.URL, FILE_SRC)
    c.setopt(c.WRITEDATA, f)
    c.perform()
    c.close()

0 讨论(0)

小鲜肉

2020-11-21 07:45
Just for the sake of completeness, it is also possible to call any program for retrieving files using the subprocess package. Programs dedicated to retrieving files are more powerful than Python functions like urlretrieve. For example, wget can download directories recursively (-R), can deal with FTP, redirects, HTTP proxies, can avoid re-downloading existing files (-nc), and aria2 can do multi-connection downloads which can potentially speed up your downloads.
```
import subprocess
subprocess.check_output(['wget', '-O', 'example_output_file.html', 'https://example.com'])
```
In Jupyter Notebook, one can also call programs directly with the ! syntax:
```
!wget -O example_output_file.html https://example.com
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 4 5 下一页