有的网页在爬取时候会报错返回
urllib.error.HTTPError: HTTP Error 403: Forbidden
这是网址在检测连接对象,所以需要伪装浏览器,设置User Agent
在浏览器打开网页 ---> F12 ---> Network ---> 刷新
然后选择一项 就是在 header 看到 User-Agent
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
import urllib.request #url包 def openUrl(url): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36', 'Host': 'jandan.net' } req = urllib.request.Request(url, headers=headers) response = urllib.request.urlopen(req) #请求 html = response.read() #获取 html = html.decode("utf-8") #解码 print(html) #打印 if __name__ == "__main__": url = "http://jandan.net/ooxx/" #'http://www.douban.com/' openUrl(url)
来源:https://www.cnblogs.com/protogenoi/p/8881163.html