I\'m writing a small Python script to grab images via google images. I\'ve managed to get things up to the point where I have the urls of the images I want in a handy list.
It seems like Wikipedia only allows access to real browsers.
The problem can be solved by specifying a User-Agent
string of a real browser, because Python's urllib
sends something like Python-urllib/3.2
by default.
Here's an example that works (with User-Agent
string of the browser that I use):
url = 'http://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Timba%2B1.jpg/220px-Timba%2B1.jpg'
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.19 (KHTML, like Gecko) Ubuntu/12.04 Chromium/18.0.1025.168 Chrome/18.0.1025.168 Safari/535.19'
u = urllib.request.urlopen(urllib.request.Request(url, headers={'User-Agent': user_agent}))