HTTP Error 403: Forbidden with urlretrieve

前端 未结 1 1331
花落未央
花落未央 2021-01-05 01:04

I am trying to download a PDF, however I get the following error: HTTP Error 403: Forbidden

I am aware that the server is blocking for whatever reason, but I cant se

相关标签:
1条回答
  • 2021-01-05 01:39

    You seem to have already realised this; the remote server is apparently checking the user agent header and rejecting requests from Python's urllib. But urllib.request.urlretrieve() doesn't allow you to change the HTTP headers, however, you can use urllib.request.URLopener.retrieve():

    import urllib.request
    
    opener = urllib.request.URLopener()
    opener.addheader('User-Agent', 'whatever')
    filename, headers = opener.retrieve(url, 'Test.pdf')
    

    N.B. You are using Python 3 and these functions are now considered part of the "Legacy interface", and URLopener has been deprecated. For that reason you should not use them in new code.

    The above aside, you are going to a lot of trouble to simply access a URL. Your code imports requests, but you don't use it - you should though because it is much easier than urllib. This works for me:

    import requests
    
    url = 'http://papers.xtremepapers.com/CIE/Cambridge%20IGCSE/Mathematics%20(0580)/0580_s03_qp_1.pdf'
    r = requests.get(url)
    with open('0580_s03_qp_1.pdf', 'wb') as outfile:
        outfile.write(r.content)
    
    0 讨论(0)
提交回复
热议问题