I need to perform some logic on all the text nodes of a HTMLDocument. This is how I currently do this:
HTMLDocument pageContent = (HTMLDocument)_webBrowser2.Docu
You could access all the text nodes in one shot using XPath in HTML Agility Pack.
I think this would work as shown, but have not tried this out.
using HtmlAgilityPack;
HtmlDocument htmlDoc = new HtmlDocument();
// filePath is a path to a file containing the html
htmlDoc.Load(filePath);
HtmlNodeCollection coll = htmlDoc.DocumentNode.SelectNodes("//text()");
foreach (HTMLNode node in coll)
{
// do the work for a text node here
}
It might be best to iterate over the childNodes (direct descendants) within a recursive function, starting at the top-level, something like:
HtmlElementCollection collection = pageContent.GetElementsByTagName("HTML");
IHTMLDOMNode htmlNode = (IHTMLDOMNode)collection[0];
ProcessChildNodes(htmlNode);
private void ProcessChildNodes(IHTMLDOMNode node)
{
foreach (IHTMLDOMNode childNode in node.childNodes)
{
if (childNode.nodeType == 3)
{
// ...
}
ProcessChildNodes(childNode);
}
}