Parsing html with the HTML Agility Pack and Linq

前端 未结 5 1578
别跟我提以往
别跟我提以往 2021-02-02 18:21

I have the following HTML

(..)

 
   Test1 
   Data 
  

        
5条回答
  •  抹茶落季
    2021-02-02 18:49

    Here's one approach - first parse all data into a data structure, and then read it. This is a little messy and certainly needs more validation, but here goes:

    HtmlWeb hw = new HtmlWeb();
    HtmlDocument doc = hw.Load("http://jsbin.com/ezuge4");
    HtmlNodeCollection nodes = doc.DocumentNode
                                  .SelectNodes("//table[@id='MyTable']//tr");
    var data = nodes.Select(
        node => node.Descendants("td")
            .ToDictionary(descendant => descendant.Attributes["class"].Value,
                          descendant => descendant.InnerText.Trim())
            ).ToDictionary(dict => dict["name"]);
    string test1Data = data["Test1"]["data"];
    

    Here I turn every to a dictionary, where the class of the is a key and the text is a value. Next, I turn the list of dictionaries into a dictionary of dictionaries (tip - abstract that away), where the name of every is the key.

提交回复
热议问题