I am trying to download a PDF, however I get the following error: HTTP Error 403: Forbidden
I am aware that the server is blocking for whatever reason, but I cant se
You seem to have already realised this; the remote server is apparently checking the user agent header and rejecting requests from Python's urllib. But urllib.request.urlretrieve()
doesn't allow you to change the HTTP headers, however, you can use urllib.request.URLopener.retrieve():
import urllib.request
opener = urllib.request.URLopener()
opener.addheader('User-Agent', 'whatever')
filename, headers = opener.retrieve(url, 'Test.pdf')
N.B. You are using Python 3 and these functions are now considered part of the "Legacy interface", and URLopener
has been deprecated. For that reason you should not use them in new code.
The above aside, you are going to a lot of trouble to simply access a URL. Your code imports requests, but you don't use it - you should though because it is much easier than urllib
. This works for me:
import requests
url = 'http://papers.xtremepapers.com/CIE/Cambridge%20IGCSE/Mathematics%20(0580)/0580_s03_qp_1.pdf'
r = requests.get(url)
with open('0580_s03_qp_1.pdf', 'wb') as outfile:
outfile.write(r.content)