I am trying to use HtmlAgilityPack to parse HTML, but am having problems.
Sample HTML Doc:
-
By default, the <OPTION>
tag is treated by Html Agility Pack as "Empty", which means it does not need a closing </OPTION>
. In this case, the closing tag is discarded. You can change this behavior using the HtmlNode.ElementFlags
collection.
Here is a code that should do what you want:
HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags.Remove("option");
doc.LoadHtml(yourHtml);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//select[@id='onoffaci']//option"))
{
Console.WriteLine("Value=" + node.Attributes["value"].Value);
Console.WriteLine("InnerText=" + node.InnerText);
Console.WriteLine();
}
讨论(0)
-
You should use:
selectNode.SelectNodes("option");
instead of:
selectNode.SelectNodes("//option");
or you are starting your XPath expression from the root of the HTML document.
讨论(0)
-
Your XPath expression:
//option
It's an absolute path: it traverse all the tree starting from the root.
You need a relative XPath expression:
descendant::option
Or the shorthand
.//option
Do note: this is the only case where to start a path with .
(self::node()
shorthand) is useful.
讨论(0)
- 热议问题