发表新帖

发表新帖

HTTP Error 403: Forbidden with urlretrieve

前端未结

关注

 1  1331

I am trying to download a PDF, however I get the following error: HTTP Error 403: Forbidden

I am aware that the server is blocking for whatever reason, but I cant se

相关标签:

1条回答

野性不改

2021-01-05 01:39
You seem to have already realised this; the remote server is apparently checking the user agent header and rejecting requests from Python's urllib. But urllib.request.urlretrieve() doesn't allow you to change the HTTP headers, however, you can use urllib.request.URLopener.retrieve():
```
import urllib.request

opener = urllib.request.URLopener()
opener.addheader('User-Agent', 'whatever')
filename, headers = opener.retrieve(url, 'Test.pdf')
```
N.B. You are using Python 3 and these functions are now considered part of the "Legacy interface", and URLopener has been deprecated. For that reason you should not use them in new code.

The above aside, you are going to a lot of trouble to simply access a URL. Your code imports requests, but you don't use it - you should though because it is much easier than urllib. This works for me:
```
import requests

url = 'http://papers.xtremepapers.com/CIE/Cambridge%20IGCSE/Mathematics%20(0580)/0580_s03_qp_1.pdf'
r = requests.get(url)
with open('0580_s03_qp_1.pdf', 'wb') as outfile:
    outfile.write(r.content)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题