How can I remove DOM element tags but leave their contents?

后端 未结 3 823
深忆病人
深忆病人 2021-01-19 14:20

I have PHP code which removes all nodes that have at least one attribute. Here is my code:


    

These

相关标签:
3条回答
  • 2021-01-19 14:53

    Prior to removing the elements you want to pluck out their child nodes and tack them on behind it.

    Example:

    $data = <<<DATA
    <div>
        <p>These line shall stay</p>
        <p class="myclass">Remove this one</p>
        <p>But keep this</p>
        <div style="color: red">and this</div>
        <div style="color: red">and <p>also</p> this</div>
        <div style="color: red">and this <div style="color: red">too</div></div>
    </div>
    DATA;
    
    $dom = new DOMDocument();
    $dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    $xpath = new DOMXPath($dom);
    
    foreach ($xpath->query("//*[@*]") as $node) {
        $parent = $node->parentNode;
        while ($node->hasChildNodes()) {
            $parent->insertBefore($node->lastChild, $node->nextSibling);
        }
        $parent->removeChild($node);
    }
    
    echo $dom->saveHTML();
    

    Outputs:

    <div>
        <p>These line shall stay</p>
        Remove this one
        <p>But keep this</p>
        and this
        and <p>also</p> this
        and this too
    </div>
    

    https://3v4l.org/9qHRM

    (I added some nested elements to demonstrate the safety of this approach.)


    Couple of asides:

    • You don't need $dom->removeChild($dom->doctype) if you load with the additional LIBXML_HTML_NODEFDTD flag.
    • Your xpath expression can be simplified to //*[@*]
    0 讨论(0)
  • 2021-01-19 15:02

    You could use replaceChild() with the text content of that node:

    foreach ($lines_to_be_removed as $line) {
      $line->parentNode->replaceChild($dom->createTextNode($line->textContent),$line);
    }
    
    // <div>
    //   <p>These line shall stay</p>
    //   Remove this one
    //   <p>But keep this</p>
    //   and this
    // </div>
    

    However, this may prove problematic with your // notation of your xpath selector and recursion.


    Using a more manual approach to copy the child contents of the target nodes into the parent nodes.

    $data = '
    <div>
      <div>1A</div>
      <div class="foo">1B
        <div>2C</div>
        <div class="foo">2D</div>
        <div>2E</div>
        <div class="foo">2F
          <div>3G</div>
          <div class="foo">3H</div>
        </div>
      </div>
    </div>';
    
    $dom = new DOMDOcument();
    $dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
    $dom->removeChild($dom->doctype);
    
    SomeFunctionName( $dom->documentElement );
    
    $html = $dom->saveHTML();
    
    function SomeFunctionName( $parent )
    {
      $nodesToDelete = array();
      if( $parent->hasChildNodes() )
      {
        foreach( $parent->childNodes as $node )
        {
          SomeFunctionName( $node );
          if( $node->hasAttributes() and count( $node->attributes ) > 0 )
          {
            foreach( $node->childNodes as $childNode )
            {
              $node->parentNode->insertBefore( clone $childNode, $node );
            }
            $nodesToDelete[] = $node;
          }
        }
      }
      foreach( $nodesToDelete as $delete)
      {
        $delete->parentNode->removeChild( $delete );
      }
    }
    
    // <div>
    //   <div>1A</div>
    //   1B
    //     <div>2C</div>
    //     2D
    //     <div>2E</div>
    //     2F
    //       <div>3G</div>
    //       3H
    //       <div>3I</div>
    //       3J
    // </div>
    

    If you want to nest the child elements in a new "div" container swap out this porition of code

        foreach( $parent->childNodes as $node )
        {
          SomeFunctionName( $node );
          if( $node->hasAttributes() and count( $node->attributes ) > 0 )
          {
            $newNode = $node->ownerDocument->createElement('div');
            foreach( $node->childNodes as $childNode )
            {
              $newNode->appendChild( clone $childNode );
            }
            $node->parentNode->insertBefore( $newNode, $node );
            $nodesToDelete[] = $node;
          }
        }
    
    // <div>
    //   <div>1A</div>
    //   <div>1B
    //     <div>2C</div>
    //     <div>2D</div>
    //     <div>2E</div>
    //     <div>2F
    //       <div>3G</div>
    //       <div>3H</div>
    //       <div>3I</div>
    //       <div>3J</div>
    //     </div>
    //   </div>
    // </div>
    
    0 讨论(0)
  • 2021-01-19 15:05

    This will remove all tags that have class and style attributes, so it's not a bullet proof:

    <?php
    
    $data = <<<DATA
    <div>
        <p>These line shall stay</p>
        <p class="myclass">Remove this one</p>
        <p>But keep this</p>
        <div style="color: red">and this</div>
    </div>
    DATA;
    
    $dom = new DOMDOcument();
    $dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
    $dom->removeChild($dom->doctype);
    
    $xpath = new DOMXPath($dom);
    
    $lines_to_be_removed = $xpath->query("//*[count(@class)>0 or count(@style)>0]");
    
    foreach ($lines_to_be_removed as $line) {
        $line->parentNode->removeChild($line);
    }
    
    // just to check
    echo $dom->saveHTML();
    ?>
    

    Note this line:

     $lines_to_be_removed = $xpath->query("//*[count(@class)>0] or count(@style)>0]");
    
    0 讨论(0)
提交回复
热议问题