“urllib.error.HTTPError: HTTP Error 404: Not Found” Python

£可爱£侵袭症+ 提交于 2019-12-08 12:26:53

问题


I'm trying to open this webpage with the urllib.request.open function: "https://prenotaonline.esteri.it/login.aspx?cidsede=100001&returnUrl=//"

I can access this webpage with my regular browser, still with the urrlib.request.open function it returns HTTP error 404:

import urllib.request


page = urllib.request.urlopen("https://prenotaonline.esteri.it/login.aspx?cidsede=100001&returnUrl=//").read()
print(page)

I get the following error:

Traceback (most recent call last):
  File "/Users/markmouawad/Documents/consu_programa/scrapper.py", line 4, in <module>
    page = urllib.request.urlopen("https://prenotaonline.esteri.it/login.aspx?cidsede=100001&returnUrl=//").read()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 472, in open
    response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 582, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 510, in error
    return self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 590, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

I'm using Python 3.5.3


回答1:


This is a very first thing you stumble upon when making spider/crawling bots.

Basic way of detecting bots is if request headers contain User-Agent header.

Try this code snippet:

import requests

headers = {'USER-AGENT': 'Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7B405'}

r = requests.get(URL, headers=headers)

print r.status_code  # should be 200 
print r.content  # should hold page content


来源:https://stackoverflow.com/questions/48489443/urllib-error-httperror-http-error-404-not-found-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!