get all the divs ids on a html page using Html Agility Pack

十年热恋 提交于 2019-12-22 10:35:13

问题


How do I get all the divs ids on a html page using Html Agility Pack. I am trying to get all the ids and put them into a collection.

<p>
    <div class='myclass1'>
        <div id='f'>
        </div>  
        <div id="myclass2">
            <div id="my"><div id="h"></div><div id="b"></div></div>
        </div>
    </div>
</p>

Code:

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); 
htmlDoc.OptionFixNestedTags=true;
htmlDoc.Load(filePath);    
HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("div"); 

How do I get collection of all divs ids?


回答1:


If you just want the ID's, you can get a collection of those id attribute nodes instead of getting a collection of the div element nodes. For instance:

List<string> ids = new List<string>();
foreach(XmlNode node in doc.SelectNodes("//div/@id"))
{
    ids.Add(node.InnerText);
}

This will skip the div elements that don't have an ID, such as the <div class='myclass1'> element in your example.

"//div/@id" is an XPath string. XPath is a technology which is vary handy to learn if you deal much with XML, or in this case, HTML via the agility pack library. XPath is an industry standard which allows you to select matching nodes in an XML document.

  • // means you want it to select the following node as a child of the current node, or in any of its descendants. Since the current node is the root node of the document, this will find matching nodes anywhere in the document.
  • div is an element name we want to match. So, in this case, we are telling it to find all div elements anywhere in the document.
  • / indicates that you want a child node. In this case the id attribute is a child of the div element, so first we say we want the div element, then we need the forward slash to say we want one of the div element's child nodes.
  • @id means we want to find all the id attributes. The @ symbol indicates that it is an attribute name instead of an element name.



回答2:


Yo can get the collection of div by passing xpath syntax

Like this

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

    htmlDoc.OptionFixNestedTags=true;

    htmlDoc.Load(filePath);

 foreach(HtmlNode div doc.DocumentElement.SelectNodes("//div"))
 {
///.. code here
 }


来源:https://stackoverflow.com/questions/11526554/get-all-the-divs-ids-on-a-html-page-using-html-agility-pack

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!