Get web page using HtmlAgilityPack.NETCore

本秂侑毒 提交于 2020-01-03 09:08:26

问题


I used the HtmlAgilityPack for work with html pages. Previously I did this:

HtmlWeb web = new HtmlWeb();
HtmlDocument document = web.Load(url);
var nodes = document.DocumentNode.SelectNodes("necessary node");

but now i need to use the HtmlAgilityPack.NETCore where HtmlWeb is absent. What should i use instead HtmlWeb to have the same result?


回答1:


Use the HttpClient as a new way to interact with remote resources via http.

As for your solution, you probably need to use the async methods here for non-blocking your thread, instead of .Result usage. Also note that HttpClient was meant to be used from different threads starting from .Net 4.5, so you should not recreate it each time:

// instance or static variable
HttpClient client = new HttpClient();

// get answer in non-blocking way
using (var response = await client.GetAsync(url))
{
    using (var content = response.Content)
    {
        // read answer in non-blocking way
        var result = await content.ReadAsStringAsync();
        var document = new HtmlDocument();
        document.LoadHtml(result);
        var nodes = document.DocumentNode.SelectNodes("Your nodes");
        //Some work with page....
    }
}

Great article about async/await: Async/Await - Best Practices in Asynchronous Programming by @StephenCleary | March 2013




回答2:


I had the same problem in Visual Studio code with netcoreapp1.0. Ended up using HtmlAgilityPack version 1.5.0-beta5 instead.

Remember to add:

using HtmlAgilityPack;
using System.Net.Http;
using System.IO;

I did it like this:

HttpClient hc = new HttpClient(); 
HttpResponseMessage result = await hc.GetAsync($"http://somewebsite.com"); 
Stream stream = await result.Content.ReadAsStreamAsync(); 
HtmlDocument doc = new HtmlDocument(); 
doc.Load(stream); 
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@class='whateverclassyouarelookingfor']");



回答3:


I wrote this and it's working. Is this a good way to solve my problem?

using (HttpClient client = new HttpClient())
{
    using (HttpResponseMessage response = client.GetAsync(url).Result)
    {
        using (HttpContent content = response.Content)
        {
            string result = content.ReadAsStringAsync().Result;
            HtmlDocument document = new HtmlDocument();
            document.LoadHtml(result);
            var nodes = document.DocumentNode.SelectNodes("Your nodes");
            //Some work with page....
        }
    }
}



回答4:


You can use HttpClient to get the content of the page.



来源:https://stackoverflow.com/questions/43364856/get-web-page-using-htmlagilitypack-netcore

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!