问题
I use XPATH to remove untidy HTML tags,
$nodeList = $xpath->query("//*[normalize-space(.)='' and not(self::br)]");
foreach($nodeList as $node)
{
$node->parentNode->removeChild($node);
}
will remove the horrible input like these,
<p><em><br /></em></p>
<p><span style="text-decoration: underline;"><em><br /></em></span></p>
but it also removes the img tag
like blow that I want to keep,
<p><img title="picture summit" src="images/32913430_127001_e.jpg" alt="picture summit" width="590" height="366" /></p>
How can I keep the img tag
input with XPATH?
回答1:
Use:
//p[not(descendant::*[self::img or self::br]) and normalize-space()='']
回答2:
Maybe you could use an XPath 1.0 expression like the one below to remove unwanted paragraphs:
//p[count(text())=0 and count(img)=0]
来源:https://stackoverflow.com/questions/7860747/how-to-keep-pimg-p-with-xpath