Python 3.4 urllib.request error (http 403)

匿名 (未验证) 提交于 2019-12-03 02:56:01

问题:

I'm trying to open and parse a html page. In python 2.7.8 I have no problem:

import urllib url = "https://ipdb.at/ip/66.196.116.112" html = urllib.urlopen(url).read() 

and everything is fine. However I want to move to python 3.4 and there I get HTTP error 403 (Forbidden). My code:

import urllib.request html = urllib.request.urlopen(url) # same URL as before  File "C:\Python34\lib\urllib\request.py", line 153, in urlopen return opener.open(url, data, timeout) File "C:\Python34\lib\urllib\request.py", line 461, in open response = meth(req, response) File "C:\Python34\lib\urllib\request.py", line 574, in http_response 'http', request, response, code, msg, hdrs) File "C:\Python34\lib\urllib\request.py", line 499, in error return self._call_chain(*args) File "C:\Python34\lib\urllib\request.py", line 433, in _call_chain result = func(*args) File "C:\Python34\lib\urllib\request.py", line 582, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden 

It work for other URLs which don't use https.

url = 'http://www.stopforumspam.com/ipcheck/212.91.188.166' 

is ok.

回答1:

It seems like the site does not like the user agent of Python 3.x.

Specifying User-Agent will solve your problem:

import urllib.request req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'}) html = urllib.request.urlopen(req).read() 

NOTE Python 2.x urllib version also receives 403 status, but unlike Python 2.x urllib2 and Python 3.x urllib, it does not raise the exception.

You can confirm that by following code:

print(urllib.urlopen(url).getcode())  # => 403 


回答2:

Here are some notes I gathered on urllib when I was studying python-3:
I kept them in case they might come in handy or help someone else out.

How to import urllib.request and urllib.parse:

import urllib.request as urlRequest import urllib.parse as urlParse 

How to make a GET request:

url = "http://www.example.net" # open the url x = urlRequest.urlopen(url) # get the source code sourceCode = x.read() 

How to make a POST request:

url = "https://www.example.com" values = {"q": "python if"} # encode values for the url values = urlParse.urlencode(values) # encode the values in UTF-8 format values = values.encode("UTF-8") # create the url targetUrl = urlRequest.Request(url, values) # open the url x  = urlRequest.urlopen(targetUrl) # get the source code sourceCode = x.read() 

How to make a POST request (403 forbidden responses):

url = "https://www.example.com" values = {"q": "python urllib"} # pretend to be a chrome 47 browser on a windows 10 machine headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"} # encode values for the url values = urlParse.urlencode(values) # encode the values in UTF-8 format values = values.encode("UTF-8") # create the url targetUrl = urlRequest.Request(url = url, data = values, headers = headers) # open the url x  = urlRequest.urlopen(targetUrl) # get the source code sourceCode = x.read() 

How to make a GET request (403 forbidden responses):

url = "https://www.example.com" # pretend to be a chrome 47 browser on a windows 10 machine headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"} req = urlRequest.Request(url, headers = headers) # open the url x = urlRequest.urlopen(req) # get the source code sourceCode = x.read() 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!