Parsing html with html agility pack

徘徊边缘 提交于 2019-12-12 07:28:51

问题


I want to collect all tags in from this div but do not know how to do this in the best way with xpath method

<div class="biz_info">
    <h3><a href="/profil/78122/s%C3%B8rby-rehab/">Sørby Rehab</a></h3>
    <table class="string_14">
        <tbody>
            <tr>
               <td>Postadr.:</td> 
               <td class="tab_space">Rognerudveien 8 B, 0681 Oslo</td> 
            </tr>

            <tr>
                <td>Telefon:</td> 
                <td class="tab_space">928 70 700</td>
            </tr>

            <tr>
                <td>Nettside:</td> 
                <td class="tab_space"><a href="http://www.sorby-rehab.no" target="_blank">www.sorby-rehab.no</a></td>
            </tr>
        </tbody>
    </table>
</div>

Today my code looks like this (but very bad):

 HtmlDocument doc = new HtmlDocument();
doc.Load(new StringReader(result));
HtmlNode root = doc.DocumentNode;

List<string> anchorTags = new List<string>();

foreach (HtmlNode link in root.SelectNodes("//@class=biz_info"))
{
    string att = link.OuterHtml;
    anchorTags.Add(att);
}

Is someone who is professional in xpath that can help me?


回答1:


HtmlDocument html = new HtmlDocument();
html.Load(new StringReader(result));
var anchorTags = html.DocumentNode.SelectNodes("//div[@class='biz_info']//a")
                     .Select(a => a.OuterHtml)
                     .ToList();

That will give you list of anchor tags html. If you need just urls:

urls = html.DocumentNode.SelectNodes("//div[@class='biz_info']//a[@href!='']")
           .Select(a => a.Attributes["href"].Value)
           .ToList();


来源:https://stackoverflow.com/questions/15501810/parsing-html-with-html-agility-pack

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!