Is there a faster way to check if an external web page exists?

前端 未结 5 866
感情败类
感情败类 2021-02-07 12:23

I wrote this method to check if a page exists or not:

protected bool PageExists(string url)
{
try
    {
        Uri u = new Uri(url);
        WebRequest w = WebR         


        
相关标签:
5条回答
  • 2021-02-07 12:51

    One obvious speedup is to run several requests in parallel - most of the time will be spent on IO, so spawning 10 threads to each check a page will complete the whole iteration around 10 times faster.

    0 讨论(0)
  • 2021-02-07 12:52

    I simply used Fredrik Mörk answer above but placed it within a method:

    private bool checkURL(string url)
            {
                bool pageExists = false;
                try
                {
                    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
                    request.Method = WebRequestMethods.Http.Head;
                    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
                    pageExists = response.StatusCode == HttpStatusCode.OK;
                }
                catch (Exception e)
                {
                    //Do what ever you want when its no working...
                    //Response.Write( e.ToString());
                }
                return pageExists;
            }
    
    0 讨论(0)
  • 2021-02-07 13:01
    1. You could do it using asynchronous way, because now you are waiting for results after each request. For few pages, you could just throw your function in ThreadPool, and wait for all requests to finish. For more requests, you could use asynchronous methods for your ResponseStream() (BeginRead etc.).
    2. The other thing that can help you (help me for sure) is to clear .Proxy property:
    w.Proxy = null;

    Without this, at least 1st request is much slower, at least on my machine.
    3. You can not download whole page, but download only header, by setting .Method to "HEAD".

    0 讨论(0)
  • 2021-02-07 13:03
    static bool GetCheck(string address)
    {
        try
        {
            HttpWebRequest request = WebRequest.Create(address) as HttpWebRequest;
            request.Method = "GET";
            request.CachePolicy = new RequestCachePolicy(RequestCacheLevel.NoCacheNoStore);
            var response = request.GetResponse();
            return (response.Headers.Count > 0);
        }
        catch
        {
            return false;
        }
    }
    static bool HeadCheck(string address)
    {
        try
        {
            HttpWebRequest request = WebRequest.Create(address) as HttpWebRequest;
            request.Method = "HEAD";
            request.CachePolicy = new RequestCachePolicy(RequestCacheLevel.NoCacheNoStore);
            var response = request.GetResponse();
            return (response.Headers.Count > 0);
        }
        catch
        {
            return false;
        }
    }
    

    Beware, certain pages (eg. WCF .svc files) may not return anything from a head request. I know because I'm working around this right now.
    EDIT - I know there are better ways to check the return data than counting headers, but this is a copy/paste from stuff where this is important to us.

    0 讨论(0)
  • 2021-02-07 13:08

    I think your approach is rather good, but would change it into only downloading the headers by adding w.Method = WebRequestMethods.Http.Head; before calling GetResponse.

    This could do it:

    HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.example.com");
    request.Method = WebRequestMethods.Http.Head;
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    bool pageExists = response.StatusCode == HttpStatusCode.OK;
    

    You may probably want to check for other status codes as well.

    0 讨论(0)
提交回复
热议问题