How to Timeout a request using Html Agility Pack

前端 未结 4 541
情书的邮戳
情书的邮戳 2021-01-12 23:56

I\'m making a request to a remote web server that is currently offline (on purpose).

I\'d like to figure out the best way to time out the request. Basically if the

相关标签:
4条回答
  • 2021-01-13 00:33

    You could use a standard HttpWebRequest to fetch the remote resource and set the Timeout property. Then feed the resulting HTML if it succeeds to HTML Agility Pack for parsing.

    0 讨论(0)
  • 2021-01-13 00:40

    Html Agility Pack is open souce. Thats why you may modify source yurself. For first add this code to class HtmlWeb:

    private int _timeout = 20000;
    
    public int Timeout 
        { 
            get { return _timeout; } 
            set
            {
                if (_timeout < 1) 
                    throw new ArgumentException("Timeout must be greater then zero.");
                _timeout = value;
            }
        }
    

    Then find this method

    private HttpStatusCode Get(Uri uri, string method, string path, HtmlDocument doc, IWebProxy proxy, ICredentials creds)
    

    and modify it:

    req = WebRequest.Create(uri) as HttpWebRequest;
    req.Method = method;
    req.UserAgent = UserAgent;
    req.Timeout = Timeout; //add this
    

    Or something like that:

    htmlWeb.PreRequest = request =>
                {
                    request.Timeout = 15000;
                    return true;
                };
    
    0 讨论(0)
  • 2021-01-13 00:54

    Retrieve your url web page through this method:

    private static string retrieveData(string url)
        {
            // used to build entire input
            StringBuilder sb = new StringBuilder();
    
            // used on each read operation
            byte[] buf = new byte[8192];
    
            // prepare the web page we will be asking for
            HttpWebRequest request = (HttpWebRequest)
            WebRequest.Create(url);
            request.Timeout = 10; //10 millisecond
            // execute the request
    
            HttpWebResponse response = (HttpWebResponse)
            request.GetResponse();
    
            // we will read data via the response stream
            Stream resStream = response.GetResponseStream();
    
            string tempString = null;
            int count = 0;
    
            do
            {
                // fill the buffer with data
                count = resStream.Read(buf, 0, buf.Length);
    
                // make sure we read some data
                if (count != 0)
                {
                    // translate from bytes to ASCII text
                    tempString = Encoding.ASCII.GetString(buf, 0, count);
    
                    // continue building the string
                    sb.Append(tempString);
                }
            }
            while (count > 0); // any more data to read?
    
            return sb.ToString();
        }
    

    And to use the HTML Agility pack and retrive the html tag like this:

    public static string htmlRetrieveInfo()
        {
            string htmlSource = retrieveData("http://example.com/test.html");
            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(htmlSource);
            if (doc.DocumentNode.SelectSingleNode("//body") != null)
            {
              HtmlNode node = doc.DocumentNode.SelectSingleNode("//body");
            }
            return node.InnerHtml;
        }
    
    0 讨论(0)
  • 2021-01-13 00:55

    I had to make a small adjustment to my originally posted code

        public JsonpResult About(string HomePageUrl)
        {
            Models.Pocos.About about = null;
            // ************* CHANGE HERE - added "timeout in milliseconds" to RemoteFileExists extension method.
            if (HomePageUrl.RemoteFileExists(1000))
            {
                // Using the Html Agility Pack, we want to extract only the
                // appropriate data from the remote page.
                HtmlWeb hw = new HtmlWeb();
                HtmlDocument doc = hw.Load(HomePageUrl);
                HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']");
    
                if (node != null)
                { 
                    about = new Models.Pocos.About { html = node.InnerHtml };
                }
                    //todo: look into whether this else statement is necessary
                else 
                {
                    about = null;
                }
            }
    
            return this.Jsonp(about);
        }
    

    Then I modified my RemoteFileExists extension method to have a timeout

        public static bool RemoteFileExists(this string url, int timeout)
        {
            try
            {
                //Creating the HttpWebRequest
                HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
    
                // ************ ADDED HERE
                // timeout the request after x milliseconds
                request.Timeout = timeout;
                // ************
    
                //Setting the Request method HEAD, you can also use GET too.
                request.Method = "HEAD";
                //Getting the Web Response.
                HttpWebResponse response = request.GetResponse() as HttpWebResponse;
                //Returns TRUE if the Status code == 200
                return (response.StatusCode == HttpStatusCode.OK);
            }
            catch
            {
                //Any exception will returns false.
                return false;
            }
        }
    

    In this approach, if my timeout fires before RemoteFileExists can determine the header response, then my bool will return false.

    0 讨论(0)
提交回复
热议问题