可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Please, Help me!

I am using Python3.3 and this code:

import urllib.request import sys Open_Page = urllib.request.urlopen(         "http://wowcircle.com"     ).read().decode().encode('utf-8')

And I take this:

    Traceback (most recent call last):   File "C:\Users\1\Desktop\WCLauncer\reg.py", line 5, in <module>     "http://forum.wowcircle.com"   File "C:\Python33\lib\urllib\request.py", line 156, in urlopen     return opener.open(url, data, timeout)   File "C:\Python33\lib\urllib\request.py", line 475, in open     response = meth(req, response)   File "C:\Python33\lib\urllib\request.py", line 587, in http_response     'http', request, response, code, msg, hdrs)   File "C:\Python33\lib\urllib\request.py", line 507, in error     result = self._call_chain(*args)   File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain     result = func(*args)   File "C:\Python33\lib\urllib\request.py", line 692, in http_error_302     return self.parent.open(new, timeout=req.timeout)   File "C:\Python33\lib\urllib\request.py", line 475, in open     response = meth(req, response)   File "C:\Python33\lib\urllib\request.py", line 587, in http_response     'http', request, response, code, msg, hdrs)   File "C:\Python33\lib\urllib\request.py", line 507, in error     result = self._call_chain(*args)   File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain     result = func(*args)   File "C:\Python33\lib\urllib\request.py", line 692, in http_error_302     return self.parent.open(new, timeout=req.timeout)   File "C:\Python33\lib\urllib\request.py", line 475, in open     response = meth(req, response)   File "C:\Python33\lib\urllib\request.py", line 587, in http_response     'http', request, response, code, msg, hdrs)   File "C:\Python33\lib\urllib\request.py", line 507, in error     result = self._call_chain(*args)   File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain     result = func(*args)   File "C:\Python33\lib\urllib\request.py", line 692, in http_error_302     return self.parent.open(new, timeout=req.timeout)   File "C:\Python33\lib\urllib\request.py", line 475, in open     response = meth(req, response)   File "C:\Python33\lib\urllib\request.py", line 587, in http_response     'http', request, response, code, msg, hdrs)   File "C:\Python33\lib\urllib\request.py", line 513, in error     return self._call_chain(*args)   File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain     result = func(*args)   File "C:\Python33\lib\urllib\request.py", line 595, in http_error_default     raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

I understand, that I have no access to site wowcircle.com. But i want only to take source code! I believe that I can do it, without acess, but how?

回答1:

I advise you to set the headers accordingly. Have a look what your browser sends (HTTP headers plugin).

A function may look like this:

def openAsOpera(url):     u = urllib.URLopener() # Python 3: urllib.request.URLOpener     u.addheaders = []     u.addheader('User-Agent', 'Opera/9.80 (Windows NT 6.1; WOW64; U; de) Presto/2.10.289 Version/12.01')     u.addheader('Accept-Language', 'de-DE,de;q=0.9,en;q=0.8')     u.addheader('Accept', 'text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/webp, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1')     f = u.open(url)     content = f.read()     f.close()     return content

This gets you around some errors on some webpages which expect more from a client than the basic version does.

Now I get this error:

Traceback (most recent call last):   File "<pyshell#0>", line 1, in <module>     s = openAsOpera('http://wowcircle.com/')   File "C:....pyw", line 522, in openAsOpera     f = u.open(url)   File "C:\Python27\lib\urllib.py", line 208, in open     return getattr(self, name)(url)   File "C:\Python27\lib\urllib.py", line 359, in open_http     return self.http_error(url, fp, errcode, errmsg, headers)   File "C:\Python27\lib\urllib.py", line 376, in http_error     return self.http_error_default(url, fp, errcode, errmsg, headers)   File "C:\Python27\lib\urllib.py", line 381, in http_error_default     raise IOError, ('http error', errcode, errmsg, headers) IOError: ('http error', 302, 'Moved Temporarily', <httplib.HTTPMessage instance at 0x02C8F1C0>)

Which means that you get access now because you fake the request of a real browser.

>>> try: s = openAsOpera('http://wowcircle.com/?pmtry=1') except: import sys; ty, err, tb = sys.exc_info()  >>> err.args[3].headers ['Server: nginx\r\n', 'Date: Sat, 05 Apr 2014 07:42:00 GMT\r\n', 'Content-Type: text/html\r\n', 'Content-Length: 154\r\n', 'Connection: close\r\n', 'Set-Cookie: PMBC=9979187990a58a5bfdaa6d1380ad6156; path=/\r\n', 'Location: http://wowcircle.com/?pmtry=1\r\n']

One thinkg to notice there: The redirect goes to this location: http://wowcircle.com/?pmtry=1 and then to whis: http://wowcircle.com/?pmtry=2. It counts up. And seems to wait for the cookie.

SO the result of my analysis is: Do not forget to send the cookie every time you access the site.

文章来源: Python3: urllib.error.HTTPError: HTTP Error 403: Forbidden

标签

python3

response

urllib