问题
How do I get all the divs ids on a html page using Html Agility Pack. I am trying to get all the ids and put them into a collection.
<p>
<div class='myclass1'>
<div id='f'>
</div>
<div id="myclass2">
<div id="my"><div id="h"></div><div id="b"></div></div>
</div>
</div>
</p>
Code:
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags=true;
htmlDoc.Load(filePath);
HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("div");
How do I get collection of all divs ids?
回答1:
If you just want the ID's, you can get a collection of those id
attribute nodes instead of getting a collection of the div
element nodes. For instance:
List<string> ids = new List<string>();
foreach(XmlNode node in doc.SelectNodes("//div/@id"))
{
ids.Add(node.InnerText);
}
This will skip the div
elements that don't have an ID, such as the <div class='myclass1'>
element in your example.
"//div/@id"
is an XPath string. XPath is a technology which is vary handy to learn if you deal much with XML, or in this case, HTML via the agility pack library. XPath is an industry standard which allows you to select matching nodes in an XML document.
//
means you want it to select the following node as a child of the current node, or in any of its descendants. Since the current node is the root node of the document, this will find matching nodes anywhere in the document.div
is an element name we want to match. So, in this case, we are telling it to find alldiv
elements anywhere in the document./
indicates that you want a child node. In this case theid
attribute is a child of thediv
element, so first we say we want thediv
element, then we need the forward slash to say we want one of thediv
element's child nodes.@id
means we want to find all theid
attributes. The@
symbol indicates that it is an attribute name instead of an element name.
回答2:
Yo can get the collection of div by passing xpath syntax
Like this
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags=true;
htmlDoc.Load(filePath);
foreach(HtmlNode div doc.DocumentElement.SelectNodes("//div"))
{
///.. code here
}
来源:https://stackoverflow.com/questions/11526554/get-all-the-divs-ids-on-a-html-page-using-html-agility-pack