urlretrieve not working for this site

蓝咒 提交于 2019-12-25 06:54:50

问题


I'm trying to download an image, however it does seem to work. Is it being blocked by ddos protection?

Here is the code:

urllib.request.urlretrieve("http://archive.is/Xx9t3/scr.png", "test.png")

Basically download that image as "test.png." I'm using python3 hence the urllib.request before urlretrieve.

import urllib.request

Have that as well.

Any way I can download the image? thanks!


回答1:


For reasons that I cannot even imagine, the server requires a well known user agent. So you must pretend to use for example firefox and it will accept to send the image:

# first build a request object
req = urllib.request.Request("http://archive.is/Xx9t3/scr.png",
        headers = {
           'User-agent':
              'Mozilla/5.0 (Windows NT 5.1; rv:43.0) Gecko/20100101 Firefox/43.0'})

#then use it
resp = urllib.request.urlopen(req)
with open("test.png","wb") as fd:
    fd.write(resp.read())

Rather stupid, but when a server admin goes mad, just be as stupid as he is...




回答2:


I'd advice you to use requests, basically the way you are trying to get the image is forbidden, check this:

import requests
import shutil

r = requests.get('http://archive.is/Xx9t3/scr.png', stream=True)
if r.status_code == 200:
    with open("test.png", 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)

This snippet was adapted from here

The magic behind this is how the resource is retrieved, with requests that part is the stream=True line. Some servers are more restricted with this methods to pull some resources like media.



来源:https://stackoverflow.com/questions/41844026/urlretrieve-not-working-for-this-site

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!