Using HTMLAgilityPack Extract text, which is not between tags and comes after specific node

試著忘記壹切 提交于 2019-12-25 01:44:39

问题


HTML code:

 <b> CAR </b>
    <br></br>
  Car is something you can drive.
    <br></br>
    <br></br>

C# code:

        HtmlAgilityPack.HtmlDocument doc = new HtmlWeb().Load("http://website.com/x.html");

        if (doc != null)
        {
            HtmlNode link = doc.DocumentNode.SelectSingleNode("//b[contains(text(), 'CAR')]");

            webBrowser1.DocumentText = link.InnerText;
            webBrowser1.AllowNavigation = true;

            webBrowser1.ScriptErrorsSuppressed = true;
            webBrowser1.Visible = true;
        }

What I manage to get: CAR

I need to get:
CAR
Car is something you can drive.

Any suggestions? I have tried adding next nodes, but it I gave NullReferenceExceptions : "//b[contains(text(), 'CAR')/br]" and "//b[contains(text(), 'CAR')/br/br]"

Thanks in advance. PS.I Would like to avoid Regex..


回答1:


XPATH is case-sensitive (see here for more on this: Is it possible to ignore case using xpath and c#? ) plus the second phrase that contains 'Car' is not a child a B element. You could have it work like this:

HtmlDocument doc = new HtmlWeb().Load("http://website.com/x.html");
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'car')]"))
{
    Console.WriteLine(node.InnerText);
}

In a console application, it will output this:

 CAR

  Car is something you can drive.


来源:https://stackoverflow.com/questions/16477119/using-htmlagilitypack-extract-text-which-is-not-between-tags-and-comes-after-sp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!