DomDocument removeChild in foreach reindexing the dom

问题

I am trying to delete p tags with data-spotid attribute

        $dom = new DOMDocument();
        @$dom->loadHTML($description);
        $pTag = $dom->getElementsByTagName('p');

        foreach ($pTag as $value) {
            /** @var DOMElement $value */
            $id = $value->getAttribute('data-spotid');
            if ($id) {
                $value->parentNode->removeChild($value);
            }
        }

but when i am removing child it is reindexing the dom. let suppose i have 8 items i deleted 1st it will reindex it and 2nd element will become 1st and it will not delete it will go to 2nd which is now 3rd element.

回答1:

This is mentioned in a couple of comments on the DomNode::removeChild documentation, with the issue apparently being how the iterator pointer on the foreach not being able to deal with the fact that you are removing items from a parent array while looping through the list of children (or something).

The recommended fix is to loop through the main node first and push the child nodes you want to delete to its own array, then loop through that "to-be-deleted" array and deleting those children from their parent. Example:

$dom = new DOMDocument();
@$dom->loadHTML($description);
$pTag = $dom->getElementsByTagName('p');

$spotid_children = array();

foreach ($pTag as $value) {
    /** @var DOMElement $value */
    $id = $value->getAttribute('data-spotid');
    if ($id) {
        $spotid_children[] = $value; 
    }
}

foreach ($spotid_children as $spotid_child) {
    $spotid_child->parentNode->removeChild($spotid_child); 
}

回答2:

We can use like this:

        $dom = new DOMDocument();
        @$dom->loadHTML($description);
        $pTag = $dom->getElementsByTagName('p');
        $count = count($pTag)
        for($i = 0; $i < $count; $i++) {
            /** @var DOMElement $value */
            $value = $pTag[$i];
            $id = $value->getAttribute('data-spotid');
            if ($id) {
                $i--;$count--;
                $value->parentNode->removeChild($value);
            }
        }

回答3:

Like I commented, the easy solution would be to just cast the iterator to an array. E.g.:

$elements = iterator_to_array($elements);

But, if we're talking about performance, a better way would be to simply select only the required nodes. Neat side-effect, the removal-problem also goes away.

E.g.:

<?php
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadXML(<<<__XML
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <element>1</element>
    <element attr="a">2</element>
    <element>3</element>
    <element>4</element>
    <element attr="a">5</element>
    <element attr="a">6</element>
    <element>7</element>
    <element>8</element>
</root>
__XML
);

$xpath = new DOMXPath($doc);
$elements = $xpath->query('//element[@attr]');

foreach ($elements as $element) {
    $element->parentNode->removeChild($element);
}

echo $doc->saveXML();

Demo: https://3v4l.org/CM9Fv

回答4:

( Assuming that the $dom contains the (DOM) paragraphs you need to filter out ). Let's try some good old JavaScript:

$ptag = $dom.all.tags("p");
$ptag = [].slice.call($ptag);
$i = 0; 
while($ptag[$i]){
'data-spotid' in $ptag[$i].attributes ? $ptag[$i++].outerHTML = "" : 0
}

NOTE: I'm using outerHTML to destroy unwanted elements to avoid calling its parent and relocating the node of interest we already have. Recent Firefox versions are finally supporting it (11+).MDN ref

I'm also using the brief all.tags() syntax for brevity; Firefox might not be supporting it yet, so you might want to fall back to 'getElementsByTagName()' call there.

来源：https://stackoverflow.com/questions/36910558/domdocument-removechild-in-foreach-reindexing-the-dom

标签

javascript

php

html

dom

domdocument