Extracting a table row with a particular attribute,using HTMLAGILITY pack

落花浮王杯 提交于 2019-12-13 06:41:33

问题


Consider this piece of code:

<tr>
                                                <td valign=top class="tim_new"><a href="/stocks/company_info/pricechart.php?sc_did=MI42" class="tim_new">3M India</a></td>
                                                <td class="tim_new" valign=top><a href='/stocks/marketstats/indcomp.php?optex=NSE&indcode=Diversified' class=tim>Diversified</a></td>

I want to write a piece of code using HTMLAgility pack which would extract the link in the first line.

    using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;

namespace WebScraper
{
    class Program
    {
        static void Main(string[] args)
        {
            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml("http://theurl.com");
            try
            {
                var links = doc.DocumentNode.SelectNodes("//td[@class=\"tim_new\"]");

            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
                Console.WriteLine(ex.StackTrace);
                Console.ReadKey();
            }

        }
    }
}

When I try to insert a foreach(var link in links) statement/loop inside the try block, a runtime error is thrown.


回答1:


The code doc.LoadHtml("http://theurl.com"); will not work. The parameter to LoadHtml should be a string containing HTML, not a URL. You must first fetch the HTML document before trying to parse it.

Once you have the document loaded, for this specific example you can use this:

IEnumerable<string> links = doc.DocumentNode
                               .SelectNodes("//a[@class='tim_new']")
                               .Select(n => n.Attributes["href"].Value);


来源:https://stackoverflow.com/questions/2982862/extracting-a-table-row-with-a-particular-attribute-using-htmlagility-pack

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!