PHP DOMDocument skips even elements

人盡茶涼 提交于 2019-12-24 09:40:16


Hello I'm using this method to replace all iframe and img tags with span tags

    $string = clean($string);
    $dom = new \DOMDocument;
    $dom->loadHTML(mb_convert_encoding($string, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    $iframes = $dom->getElementsByTagName('iframe');

    foreach($iframes as $iframe) {
        $src = $iframe->getAttribute('src');
        $span = $dom->createElement('span');
        $span->setAttribute('title', $src);
        $span->setAttribute('class', 'lazy-youtube');
        $iframe->parentNode->replaceChild($span, $iframe);

    $images = $dom->getElementsByTagName('img');

    foreach($images as $image) {
        $src = $image->getAttribute('src');
        $span = $dom->createElement('span');
        $span->setAttribute('title', $src);
        $span->setAttribute('class', 'lazy-image');
        $image->parentNode->replaceChild($span, $image);

    $html = $dom->saveHTML();

    return clean($html);

but problem is that it skips elements it's always like this

// Iframe

// Img

Html for iframes

<div class="content">
   <iframe frameborder="0" height="315" src="" width="560"></iframe>
   <iframe frameborder="0" height="315" src="" width="560"></iframe>
   <iframe frameborder="0" height="315" src="" width="560"></iframe>
   <iframe frameborder="0" height="315" src="" width="560"></iframe>
   <iframe frameborder="0" height="315" src="" width="560"></iframe>

All same type of elements have same attributes, only src is different. Anyone know how can I fix it to replace all elements?


Explanation of the problem: It's probably skipping every other element because once you remove an iframe, for example, the object (the list of elements) changes in a way that all other iframes shift to ocuppy the removed's spot.

One way to fix it:

// code
$iframes = $dom->getElementsByTagName('iframe');
while($iframes->length > 0){ // while there are still frames left to change
    foreach($iframes as $iframe) {
        // your regular code to replace iframe with span
        // break; // this makes it easier to understand, but not really necessary
    $iframes = $dom->getElementsByTagName('iframe'); // get the (remaining) skipped frames until there is none left
// code

Don't forget to do the same with the images.

Here is a better way to understand the problem:

 1 - List of iframes
 iframe1  iframe2  iframe3  iframe4 iframe5 [...]
    /\ - current item in loop

 2 - Replacing iframe1, it comes out of the list (since I just want iframes), so the list is now:
 iframe2  iframe3  iframe4  iframe5 [...]

 3 - Loop continues and it goes to the next item
 iframe2  iframe3  iframe4  iframe5 [...]
             /\ - current item in loop

See how, that way, it would skip every other element?


That happens because the foreach does not make a copy of the iterated object and the DOMNodeList element gets modified when you replace the child. The correct way of iterating the DOMNodeList is:

$elements = $domElement->getElementsByTagName("iframe");
while($elements->length > 0) {
    $oldNode = $elements->item(0);
    $newNode = $dom->createElement("image");
    $oldNode->parentNode->replaceChild($oldNode, $newNode);

Using the same logic, if you need to move the child elements from the old node to the new one, you can do this:

while($oldNode->childNodes->length > 0)

