Using HTMLAgilityPack Extract text, which is not between tags and comes after specific node

问题

HTML code:

 <b> CAR </b>
    <br></br>
  Car is something you can drive.
    <br></br>
    <br></br>

C# code:

        HtmlAgilityPack.HtmlDocument doc = new HtmlWeb().Load("http://website.com/x.html");

        if (doc != null)
        {
            HtmlNode link = doc.DocumentNode.SelectSingleNode("//b[contains(text(), 'CAR')]");

            webBrowser1.DocumentText = link.InnerText;
            webBrowser1.AllowNavigation = true;

            webBrowser1.ScriptErrorsSuppressed = true;
            webBrowser1.Visible = true;
        }

What I manage to get: CAR

I need to get:
CAR
Car is something you can drive.

Any suggestions? I have tried adding next nodes, but it I gave NullReferenceExceptions : "//b[contains(text(), 'CAR')/br]" and "//b[contains(text(), 'CAR')/br/br]"

Thanks in advance. PS.I Would like to avoid Regex..

回答1:

XPATH is case-sensitive (see here for more on this: Is it possible to ignore case using xpath and c#? ) plus the second phrase that contains 'Car' is not a child a B element. You could have it work like this:

HtmlDocument doc = new HtmlWeb().Load("http://website.com/x.html");
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'car')]"))
{
    Console.WriteLine(node.InnerText);
}

In a console application, it will output this:

 CAR

  Car is something you can drive.

来源：https://stackoverflow.com/questions/16477119/using-htmlagilitypack-extract-text-which-is-not-between-tags-and-comes-after-sp

标签

html

xpath

web-scraping

html-agility-pack

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!