Select nodeValue but exclude child elements

旧时模样 提交于 2020-01-01 05:34:19

问题


Let's say I have this code:

<p dataname="description">
Hello this is a description. <a href="#">Click here for more.</a>
</p>

How do I select the nodeValue of p but exclude a and it's content?

My current code:

$result = $xpath->query("//p[@dataname='description'][not(self::a)]");

I select it by $result->item(0)->nodeValue;


回答1:


Simply appending /text() to your query should do the trick

$result = $xpath->query("//p[@dataname='description'][not(self::a)]/text()");



回答2:


Unsure if PHP's XPath supports this, but this XPath does the trick for me in Scrapy (Python based scraping framework):

$xpath->query('//p[@dataname='description']/text()[following-sibling::a]')

If this doesn't work, try Kristoffers solution, or you could also use a regex solution. For example:

$output = preg_replace("~<.*?>.*?<.*?>~msi", '', $result->item(0)->nodeValue);

That'll remove any HTML tag with any content in it, excluding text which is not encapsulated by HTML tags.



来源:https://stackoverflow.com/questions/9192105/select-nodevalue-but-exclude-child-elements

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!