HTMLAgilityPack and separating on <br/>

若如初见. 提交于 2019-12-24 06:38:09

问题


I have some html, which is separated by <br/> e.g.:

Jack Janson
<br/>
309 123 456
<br/>
My Special Street 43

What is the easiest way to retrieve the information in 3 columns?

I am not an XPath expert, so another approach would be to separate the string on the line break, and just work with the array. Is there a smarter way to do it?

Update: Forgot to format the code.


回答1:


In pure XPATH over XML, you would use an XPATH expression like this: //preceding-sibling::br or //following-sibling::br (see here for help on XPATH Axes)

But, the XPATH over HTML implementation that you'll find in Html Agility Pack does not support pure text node or (Attribute node) in XPATH selection expressions (//br/text() or //br/@blah do not work for example). Note it works in filters, so, these //br[text()='blah'] or //br[@att='blah'] work.

So, back to the question, you need to combine XPATH and code, something like this:

HtmlDocument doc = new HtmlDocument();
doc.Load(myHtmlFile);

foreach (HtmlNode p in doc.DocumentNode.SelectNodes("//br"))
{
    Console.WriteLine(p.PreviousSibling.InnerText.Trim());
}

That will output

Jack Janson
309 123 456


来源:https://stackoverflow.com/questions/6102761/htmlagilitypack-and-separating-on-br

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!