urlretrieve not working for this site

问题

I'm trying to download an image, however it does seem to work. Is it being blocked by ddos protection?

Here is the code:

urllib.request.urlretrieve("http://archive.is/Xx9t3/scr.png", "test.png")

Basically download that image as "test.png." I'm using python3 hence the urllib.request before urlretrieve.

import urllib.request

Have that as well.

Any way I can download the image? thanks!

回答1:

For reasons that I cannot even imagine, the server requires a well known user agent. So you must pretend to use for example firefox and it will accept to send the image:

# first build a request object
req = urllib.request.Request("http://archive.is/Xx9t3/scr.png",
        headers = {
           'User-agent':
              'Mozilla/5.0 (Windows NT 5.1; rv:43.0) Gecko/20100101 Firefox/43.0'})

#then use it
resp = urllib.request.urlopen(req)
with open("test.png","wb") as fd:
    fd.write(resp.read())

Rather stupid, but when a server admin goes mad, just be as stupid as he is...

回答2:

I'd advice you to use requests, basically the way you are trying to get the image is forbidden, check this:

import requests
import shutil

r = requests.get('http://archive.is/Xx9t3/scr.png', stream=True)
if r.status_code == 200:
    with open("test.png", 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)

This snippet was adapted from here

The magic behind this is how the resource is retrieved, with requests that part is the stream=True line. Some servers are more restricted with this methods to pull some resources like media.

来源：https://stackoverflow.com/questions/41844026/urlretrieve-not-working-for-this-site

标签

python

urllib