Python script to see if a web page exists without downloading the whole page?

佐手、 提交于 2019-11-27 01:37:22

问题


I'm trying to write a script to test for the existence of a web page, would be nice if it would check without downloading the whole page.

This is my jumping off point, I've seen multiple examples use httplib in the same way, however, every site I check simply returns false.

import httplib
from httplib import HTTP
from urlparse import urlparse

def checkUrl(url):
    p = urlparse(url)
    h = HTTP(p[1])
    h.putrequest('HEAD', p[2])
    h.endheaders()
    return h.getreply()[0] == httplib.OK

if __name__=="__main__":
    print checkUrl("http://www.stackoverflow.com") # True
    print checkUrl("http://stackoverflow.com/notarealpage.html") # False

Any ideas?

Edit

Someone suggested this, but their post was deleted.. does urllib2 avoid downloading the whole page?

import urllib2

try:
    urllib2.urlopen(some_url)
    return True
except urllib2.URLError:
    return False

回答1:


how about this:

import httplib
from urlparse import urlparse

def checkUrl(url):
    p = urlparse(url)
    conn = httplib.HTTPConnection(p.netloc)
    conn.request('HEAD', p.path)
    resp = conn.getresponse()
    return resp.status < 400

if __name__ == '__main__':
    print checkUrl('http://www.stackoverflow.com') # True
    print checkUrl('http://stackoverflow.com/notarealpage.html') # False

this will send an HTTP HEAD request and return True if the response status code is < 400.

  • notice that StackOverflow's root path returns a redirect (301), not a 200 OK.



回答2:


Using requests, this is as simple as:

import requests

ret = requests.head('http://www.example.com')
print(ret.status_code)

This just loads the website's header. To test if this was successfull, you can check the results status_code. Or use the raise_for_status method which raises an Exception if the connection was not succesfull.




回答3:


How about this.

import requests

def url_check(url):
    #Description

    """Boolean return - check to see if the site exists.
       This function takes a url as input and then it requests the site 
       head - not the full html and then it checks the response to see if 
       it's less than 400. If it is less than 400 it will return TRUE 
       else it will return False.
    """
    try:
            site_ping = requests.head(url)
            if site_ping.status_code < 400:
                #  To view the return status code, type this   :   **print(site.ping.status_code)** 
                return True
            else:
                return False
    except Exception:
        return False



回答4:


You can try

import urllib2

try:
    urllib2.urlopen(url='https://someURL')
except:
    print("page not found")


来源:https://stackoverflow.com/questions/6471275/python-script-to-see-if-a-web-page-exists-without-downloading-the-whole-page

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!