Parsing HTML Reading Option Tag Content with HtmlAgillityPack

前端 未结 3 740
旧巷少年郎
旧巷少年郎 2020-12-06 22:52

I am trying to use HtmlAgilityPack to parse HTML, but am having problems.

Sample HTML Doc:


  

        
相关标签:
3条回答
  • 2020-12-06 23:10

    By default, the <OPTION> tag is treated by Html Agility Pack as "Empty", which means it does not need a closing </OPTION>. In this case, the closing tag is discarded. You can change this behavior using the HtmlNode.ElementFlags collection.

    Here is a code that should do what you want:

    HtmlDocument doc = new HtmlDocument();
    HtmlNode.ElementsFlags.Remove("option");
    doc.LoadHtml(yourHtml);
    
    foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//select[@id='onoffaci']//option"))
    {
        Console.WriteLine("Value=" + node.Attributes["value"].Value);
        Console.WriteLine("InnerText=" + node.InnerText);
        Console.WriteLine();
    }
    
    0 讨论(0)
  • 2020-12-06 23:27

    You should use:

    selectNode.SelectNodes("option");
    

    instead of:

    selectNode.SelectNodes("//option");
    

    or you are starting your XPath expression from the root of the HTML document.

    0 讨论(0)
  • 2020-12-06 23:35

    Your XPath expression:

    //option
    

    It's an absolute path: it traverse all the tree starting from the root.

    You need a relative XPath expression:

    descendant::option
    

    Or the shorthand

    .//option
    

    Do note: this is the only case where to start a path with . (self::node() shorthand) is useful.

    0 讨论(0)
提交回复
热议问题