How can I retrieve all the text nodes of a HTMLDocument in the fastest way in C#?

前端 未结 2 1147
Happy的楠姐
Happy的楠姐 2021-01-23 11:03

I need to perform some logic on all the text nodes of a HTMLDocument. This is how I currently do this:

HTMLDocument pageContent = (HTMLDocument)_webBrowser2.Docu         


        
2条回答
  •  攒了一身酷
    2021-01-23 11:34

    It might be best to iterate over the childNodes (direct descendants) within a recursive function, starting at the top-level, something like:

    HtmlElementCollection collection = pageContent.GetElementsByTagName("HTML");
    IHTMLDOMNode htmlNode = (IHTMLDOMNode)collection[0];
    ProcessChildNodes(htmlNode);
    
    private void ProcessChildNodes(IHTMLDOMNode node)
    {
        foreach (IHTMLDOMNode childNode in node.childNodes)
        {
            if (childNode.nodeType == 3)
            {
                // ...
            }
            ProcessChildNodes(childNode);
        }
    }
    

提交回复
热议问题