How can I retrieve all the text nodes of a HTMLDocument in the fastest way in C#?

前端 未结 2 1149
Happy的楠姐
Happy的楠姐 2021-01-23 11:03

I need to perform some logic on all the text nodes of a HTMLDocument. This is how I currently do this:

HTMLDocument pageContent = (HTMLDocument)_webBrowser2.Docu         


        
2条回答
  •  北荒
    北荒 (楼主)
    2021-01-23 11:19

    You could access all the text nodes in one shot using XPath in HTML Agility Pack.

    I think this would work as shown, but have not tried this out.

    using HtmlAgilityPack;
    HtmlDocument htmlDoc = new HtmlDocument();
    
    // filePath is a path to a file containing the html
    htmlDoc.Load(filePath);
    HtmlNodeCollection coll = htmlDoc.DocumentNode.SelectNodes("//text()");
    
    foreach (HTMLNode node in coll)
    {
      // do the work for a text node here
    }
    

提交回复
热议问题