问题
HTML code:
<b> CAR </b>
<br></br>
Car is something you can drive.
<br></br>
<br></br>
C# code:
HtmlAgilityPack.HtmlDocument doc = new HtmlWeb().Load("http://website.com/x.html");
if (doc != null)
{
HtmlNode link = doc.DocumentNode.SelectSingleNode("//b[contains(text(), 'CAR')]");
webBrowser1.DocumentText = link.InnerText;
webBrowser1.AllowNavigation = true;
webBrowser1.ScriptErrorsSuppressed = true;
webBrowser1.Visible = true;
}
What I manage to get: CAR
I need to get:
CAR
Car is something you can drive.
Any suggestions? I have tried adding next nodes, but it I gave NullReferenceExceptions : "//b[contains(text(), 'CAR')/br]" and "//b[contains(text(), 'CAR')/br/br]"
Thanks in advance. PS.I Would like to avoid Regex..
回答1:
XPATH is case-sensitive (see here for more on this: Is it possible to ignore case using xpath and c#? ) plus the second phrase that contains 'Car' is not a child a B element. You could have it work like this:
HtmlDocument doc = new HtmlWeb().Load("http://website.com/x.html");
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'car')]"))
{
Console.WriteLine(node.InnerText);
}
In a console application, it will output this:
CAR
Car is something you can drive.
来源:https://stackoverflow.com/questions/16477119/using-htmlagilitypack-extract-text-which-is-not-between-tags-and-comes-after-sp