download image from url using python urllib but receiving HTTP Error 403: Forbidden

前端未结

关注

 3  1621

I want to download image file from a url using python module \"urllib.request\", which works for some website (e.g. mangastream.com), but does not work for another (mangadoo

相关标签:

3条回答

日久生厌

2020-11-27 20:57

I try wget with the url in terminal and it works:

wget -O out_005.png  http://mangadoom.co/wp-content/manga/5170/886/005.png

so my way around is to use the script below, and it works too.

import os
out_image = 'out_005.png'
url = 'http://mangadoom.co/wp-content/manga/5170/886/005.png'
os.system("wget -O {0} {1}".format(out_image, url))

0 讨论(0)

不要未来只要你来

2020-11-27 21:04

You can build an opener. Here's the example:

import urllib.request

opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
urllib.request.install_opener(opener)

url=''
local=''
urllib.request.urlretrieve(url,local)

By the way, the following codes are the same:

(none-opener)

req=urllib.request.Request(url,data,hdr)   
html=urllib.request.urlopen(req)

(opener builded)

html=operate.open(url,data,timeout)

However, we are not able to add header when we use:

urllib.request.urlretrieve()

So in this case, we have to build an opener.

0 讨论(0)

一向

2020-11-27 21:11
This website is blocking the user-agent used by urllib, so you need to change it in your request. Unfortunately I don't think urlretrieve supports this directly.

I advise for the use of the beautiful requests library, the code becomes (from here) :
```
import requests
import shutil

r = requests.get('http://mangadoom.co/wp-content/manga/5170/886/005.png', stream=True)
if r.status_code == 200:
    with open("img.png", 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)
```
Note that it seems this website does not forbide requests user-agent. But if need to be modified it is easy :
```
r = requests.get('http://mangadoom.co/wp-content/manga/5170/886/005.png',
                 stream=True, headers={'User-agent': 'Mozilla/5.0'})
```
Also relevant : changing user-agent in urllib
0 讨论(0)
发布评论:

提交评论
- 加载中...