DomDocument removeChild in foreach reindexing the dom

前端 未结 4 1339
梦毁少年i
梦毁少年i 2021-01-18 11:25

I am trying to delete p tags with data-spotid attribute

        $dom = new DOMDocument();
        @$dom->loadHTML($description);         


        
相关标签:
4条回答
  • 2021-01-18 11:49

    We can use like this:

            $dom = new DOMDocument();
            @$dom->loadHTML($description);
            $pTag = $dom->getElementsByTagName('p');
            $count = count($pTag)
            for($i = 0; $i < $count; $i++) {
                /** @var DOMElement $value */
                $value = $pTag[$i];
                $id = $value->getAttribute('data-spotid');
                if ($id) {
                    $i--;$count--;
                    $value->parentNode->removeChild($value);
                }
            }
    
    0 讨论(0)
  • 2021-01-18 11:55

    Like I commented, the easy solution would be to just cast the iterator to an array. E.g.:

    $elements = iterator_to_array($elements);
    

    But, if we're talking about performance, a better way would be to simply select only the required nodes. Neat side-effect, the removal-problem also goes away.

    E.g.:

    <?php
    $doc = new DOMDocument('1.0', 'UTF-8');
    $doc->loadXML(<<<__XML
    <?xml version="1.0" encoding="UTF-8"?>
    <root>
        <element>1</element>
        <element attr="a">2</element>
        <element>3</element>
        <element>4</element>
        <element attr="a">5</element>
        <element attr="a">6</element>
        <element>7</element>
        <element>8</element>
    </root>
    __XML
    );
    
    $xpath = new DOMXPath($doc);
    $elements = $xpath->query('//element[@attr]');
    
    foreach ($elements as $element) {
        $element->parentNode->removeChild($element);
    }
    
    echo $doc->saveXML();
    

    Demo: https://3v4l.org/CM9Fv

    0 讨论(0)
  • 2021-01-18 11:56

    This is mentioned in a couple of comments on the DomNode::removeChild documentation, with the issue apparently being how the iterator pointer on the foreach not being able to deal with the fact that you are removing items from a parent array while looping through the list of children (or something).

    The recommended fix is to loop through the main node first and push the child nodes you want to delete to its own array, then loop through that "to-be-deleted" array and deleting those children from their parent. Example:

    $dom = new DOMDocument();
    @$dom->loadHTML($description);
    $pTag = $dom->getElementsByTagName('p');
    
    $spotid_children = array();
    
    foreach ($pTag as $value) {
        /** @var DOMElement $value */
        $id = $value->getAttribute('data-spotid');
        if ($id) {
            $spotid_children[] = $value; 
        }
    }
    
    foreach ($spotid_children as $spotid_child) {
        $spotid_child->parentNode->removeChild($spotid_child); 
    }
    
    0 讨论(0)
  • 2021-01-18 12:02

    ( Assuming that the $dom contains the (DOM) paragraphs you need to filter out ). Let's try some good old JavaScript:

    $ptag = $dom.all.tags("p");
    $ptag = [].slice.call($ptag);
    $i = 0; 
    while($ptag[$i]){
    'data-spotid' in $ptag[$i].attributes ? $ptag[$i++].outerHTML = "" : 0
    }
    

    NOTE: I'm using outerHTML to destroy unwanted elements to avoid calling its parent and relocating the node of interest we already have. Recent Firefox versions are finally supporting it (11+).MDN ref

    I'm also using the brief all.tags() syntax for brevity; Firefox might not be supporting it yet, so you might want to fall back to 'getElementsByTagName()' call there.

    0 讨论(0)
提交回复
热议问题