How to Timeout a request using Html Agility Pack

吃可爱长大的小学妹 提交于 2019-12-19 07:22:15

问题


I'm making a request to a remote web server that is currently offline (on purpose).

I'd like to figure out the best way to time out the request. Basically if the request runs longer than "X" milliseconds, then exit the request and return a null response.

Currently the web request just sits there waiting for a response.....

How would I best approach this problem?

Here's a current code snippet

    public JsonpResult About(string HomePageUrl)
    {
        Models.Pocos.About about = null;
        if (HomePageUrl.RemoteFileExists())
        {
            // Using the Html Agility Pack, we want to extract only the
            // appropriate data from the remote page.
            HtmlWeb hw = new HtmlWeb();
            HtmlDocument doc = hw.Load(HomePageUrl);
            HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']");

            if (node != null)
            { 
                about = new Models.Pocos.About { html = node.InnerHtml };
            }
                //todo: look into whether this else statement is necessary
            else 
            {
                about = null;
            }
        }

        return this.Jsonp(about);
    }

回答1:


Retrieve your url web page through this method:

private static string retrieveData(string url)
    {
        // used to build entire input
        StringBuilder sb = new StringBuilder();

        // used on each read operation
        byte[] buf = new byte[8192];

        // prepare the web page we will be asking for
        HttpWebRequest request = (HttpWebRequest)
        WebRequest.Create(url);
        request.Timeout = 10; //10 millisecond
        // execute the request

        HttpWebResponse response = (HttpWebResponse)
        request.GetResponse();

        // we will read data via the response stream
        Stream resStream = response.GetResponseStream();

        string tempString = null;
        int count = 0;

        do
        {
            // fill the buffer with data
            count = resStream.Read(buf, 0, buf.Length);

            // make sure we read some data
            if (count != 0)
            {
                // translate from bytes to ASCII text
                tempString = Encoding.ASCII.GetString(buf, 0, count);

                // continue building the string
                sb.Append(tempString);
            }
        }
        while (count > 0); // any more data to read?

        return sb.ToString();
    }

And to use the HTML Agility pack and retrive the html tag like this:

public static string htmlRetrieveInfo()
    {
        string htmlSource = retrieveData("http://example.com/test.html");
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(htmlSource);
        if (doc.DocumentNode.SelectSingleNode("//body") != null)
        {
          HtmlNode node = doc.DocumentNode.SelectSingleNode("//body");
        }
        return node.InnerHtml;
    }



回答2:


Html Agility Pack is open souce. Thats why you may modify source yurself. For first add this code to class HtmlWeb:

private int _timeout = 20000;

public int Timeout 
    { 
        get { return _timeout; } 
        set
        {
            if (_timeout < 1) 
                throw new ArgumentException("Timeout must be greater then zero.");
            _timeout = value;
        }
    }

Then find this method

private HttpStatusCode Get(Uri uri, string method, string path, HtmlDocument doc, IWebProxy proxy, ICredentials creds)

and modify it:

req = WebRequest.Create(uri) as HttpWebRequest;
req.Method = method;
req.UserAgent = UserAgent;
req.Timeout = Timeout; //add this

Or something like that:

htmlWeb.PreRequest = request =>
            {
                request.Timeout = 15000;
                return true;
            };



回答3:


I had to make a small adjustment to my originally posted code

    public JsonpResult About(string HomePageUrl)
    {
        Models.Pocos.About about = null;
        // ************* CHANGE HERE - added "timeout in milliseconds" to RemoteFileExists extension method.
        if (HomePageUrl.RemoteFileExists(1000))
        {
            // Using the Html Agility Pack, we want to extract only the
            // appropriate data from the remote page.
            HtmlWeb hw = new HtmlWeb();
            HtmlDocument doc = hw.Load(HomePageUrl);
            HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']");

            if (node != null)
            { 
                about = new Models.Pocos.About { html = node.InnerHtml };
            }
                //todo: look into whether this else statement is necessary
            else 
            {
                about = null;
            }
        }

        return this.Jsonp(about);
    }

Then I modified my RemoteFileExists extension method to have a timeout

    public static bool RemoteFileExists(this string url, int timeout)
    {
        try
        {
            //Creating the HttpWebRequest
            HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;

            // ************ ADDED HERE
            // timeout the request after x milliseconds
            request.Timeout = timeout;
            // ************

            //Setting the Request method HEAD, you can also use GET too.
            request.Method = "HEAD";
            //Getting the Web Response.
            HttpWebResponse response = request.GetResponse() as HttpWebResponse;
            //Returns TRUE if the Status code == 200
            return (response.StatusCode == HttpStatusCode.OK);
        }
        catch
        {
            //Any exception will returns false.
            return false;
        }
    }

In this approach, if my timeout fires before RemoteFileExists can determine the header response, then my bool will return false.




回答4:


You could use a standard HttpWebRequest to fetch the remote resource and set the Timeout property. Then feed the resulting HTML if it succeeds to HTML Agility Pack for parsing.



来源:https://stackoverflow.com/questions/6574109/how-to-timeout-a-request-using-html-agility-pack

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!