download image from url using python urllib but receiving HTTP Error 403: Forbidden

匿名 (未验证) 提交于 2019-12-03 01:23:02

问题:

I want to download image file from a url using python module "urllib.request", which works for some website (e.g. mangastream.com), but does not work for another (mangadoom.co) receiving error "HTTP Error 403: Forbidden". What could be the problem for the latter case and how to fix it?

I am using python3.4 on OSX.

import urllib.request  # does not work img_url = 'http://mangadoom.co/wp-content/manga/5170/886/005.png' img_filename = 'my_img.png' urllib.request.urlretrieve(img_url, img_filename) 

At the end of error message it said:

...  HTTPError: HTTP Error 403: Forbidden 

However, it works for another website

# work img_url = 'http://img.mangastream.com/cdn/manga/51/3140/006.png' img_filename = 'my_img.png' urllib.request.urlretrieve(img_url, img_filename) 

I have tried the solutions from the post below, but none of them works on mangadoom.co.

Downloading a picture via urllib and python

How do I copy a remote image in python?

The solution here also does not fit because my case is to download image. urllib2.HTTPError: HTTP Error 403: Forbidden

Non-python solution is also welcome. Your suggestion will be very appreciated.

回答1:

This website is blocking the user-agent used by urllib, so you need to change it in your request. Unfortunately I don't think urlretrieve supports this directly.

I advise for the use of the beautiful requests library, the code becomes (from here) :

import requests import shutil  r = requests.get('http://mangadoom.co/wp-content/manga/5170/886/005.png', stream=True) if r.status_code == 200:     with open("img.png", 'wb') as f:         r.raw.decode_content = True         shutil.copyfileobj(r.raw, f) 

Note that it seems this website does not forbide requests user-agent. But if need to be modified it is easy :

r = requests.get('http://mangadoom.co/wp-content/manga/5170/886/005.png',                  stream=True, headers={'User-agent': 'Mozilla/5.0'}) 

Also relevant : changing user-agent in urllib



回答2:

You can build an opener. Here's the example:

import urllib.request  opener=urllib.request.build_opener() opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')] urllib.request.install_opener(opener)  url='' local='' urllib.request.urlretrieve(url,local) 

By the way, the following codes are the same:

(none-opener)

req=urllib.request.Request(url,data,hdr)    html=urllib.request.urlopen(req) 

(opener builded)

html=operate.open(url,data,timeout) 

However, we are not able to add header when we use:

urllib.request.urlretrieve() 

So in this case, we have to build an opener.



回答3:

I try wget with the url in terminal and it works:

wget -O out_005.png  http://mangadoom.co/wp-content/manga/5170/886/005.png 

so my way around is to use the script below, and it works too.

import os out_image = 'out_005.png' url = 'http://mangadoom.co/wp-content/manga/5170/886/005.png' os.system("wget -O {0} {1}".format(out_image, url)) 


回答4:

Some sites require you to pass credentials via url



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!