I have PHP code which removes all nodes that have at least one attribute. Here is my code:
These
Prior to removing the elements you want to pluck out their child nodes and tack them on behind it.
$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p>But keep this</p>
<div style="color: red">and this</div>
<div style="color: red">and <p>also</p> this</div>
<div style="color: red">and this <div style="color: red">too</div></div>
</div>
DATA;
$dom = new DOMDocument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//*[@*]") as $node) {
$parent = $node->parentNode;
while ($node->hasChildNodes()) {
$parent->insertBefore($node->lastChild, $node->nextSibling);
}
$parent->removeChild($node);
}
echo $dom->saveHTML();
<div>
<p>These line shall stay</p>
Remove this one
<p>But keep this</p>
and this
and <p>also</p> this
and this too
</div>
https://3v4l.org/9qHRM
(I added some nested elements to demonstrate the safety of this approach.)
Couple of asides:
$dom->removeChild($dom->doctype)
if you load with the additional LIBXML_HTML_NODEFDTD
flag.//*[@*]
You could use replaceChild()
with the text content of that node:
foreach ($lines_to_be_removed as $line) {
$line->parentNode->replaceChild($dom->createTextNode($line->textContent),$line);
}
// <div>
// <p>These line shall stay</p>
// Remove this one
// <p>But keep this</p>
// and this
// </div>
However, this may prove problematic with your //
notation of your xpath selector and recursion.
Using a more manual approach to copy the child contents of the target nodes into the parent nodes.
$data = '
<div>
<div>1A</div>
<div class="foo">1B
<div>2C</div>
<div class="foo">2D</div>
<div>2E</div>
<div class="foo">2F
<div>3G</div>
<div class="foo">3H</div>
</div>
</div>
</div>';
$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$dom->removeChild($dom->doctype);
SomeFunctionName( $dom->documentElement );
$html = $dom->saveHTML();
function SomeFunctionName( $parent )
{
$nodesToDelete = array();
if( $parent->hasChildNodes() )
{
foreach( $parent->childNodes as $node )
{
SomeFunctionName( $node );
if( $node->hasAttributes() and count( $node->attributes ) > 0 )
{
foreach( $node->childNodes as $childNode )
{
$node->parentNode->insertBefore( clone $childNode, $node );
}
$nodesToDelete[] = $node;
}
}
}
foreach( $nodesToDelete as $delete)
{
$delete->parentNode->removeChild( $delete );
}
}
// <div>
// <div>1A</div>
// 1B
// <div>2C</div>
// 2D
// <div>2E</div>
// 2F
// <div>3G</div>
// 3H
// <div>3I</div>
// 3J
// </div>
If you want to nest the child elements in a new "div" container swap out this porition of code
foreach( $parent->childNodes as $node )
{
SomeFunctionName( $node );
if( $node->hasAttributes() and count( $node->attributes ) > 0 )
{
$newNode = $node->ownerDocument->createElement('div');
foreach( $node->childNodes as $childNode )
{
$newNode->appendChild( clone $childNode );
}
$node->parentNode->insertBefore( $newNode, $node );
$nodesToDelete[] = $node;
}
}
// <div>
// <div>1A</div>
// <div>1B
// <div>2C</div>
// <div>2D</div>
// <div>2E</div>
// <div>2F
// <div>3G</div>
// <div>3H</div>
// <div>3I</div>
// <div>3J</div>
// </div>
// </div>
// </div>
This will remove all tags that have class and style attributes, so it's not a bullet proof:
<?php
$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p>But keep this</p>
<div style="color: red">and this</div>
</div>
DATA;
$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$dom->removeChild($dom->doctype);
$xpath = new DOMXPath($dom);
$lines_to_be_removed = $xpath->query("//*[count(@class)>0 or count(@style)>0]");
foreach ($lines_to_be_removed as $line) {
$line->parentNode->removeChild($line);
}
// just to check
echo $dom->saveHTML();
?>
Note this line:
$lines_to_be_removed = $xpath->query("//*[count(@class)>0] or count(@style)>0]");