DOMDocument remove script tags from HTML source

余生颓废 提交于 2019-12-05 00:20:51

Your error is actually trivial. A DOMNode object (and all its descendants - DOMElement, DOMNodeList and a few others!) is automatically updated when its parent element changes, most notably when its number of children change. This is written on a couple of lines in the PHP doc, but is mostly swept under the carpet.

If you loop using ($k instanceof DOMNode)->length, and subsequently remove elements from the nodes, you'll notice that the length property actually changes! I had to write my own library to counteract this and a few other quirks.

The solution:

if($dom->loadHTML($result))
{
    while (($r = $dom->getElementsByTagName("script")) && $r->length) {
            $r->item(0)->parentNode->removeChild($r->item(0));
    }
echo $dom->saveHTML();

I'm not actually looping - just popping the first element one at a time. The result: http://sebrenauld.co.uk/domremovescript.php

To avoid that you get the surprises of a live node list -- that gets shorter as you delete nodes -- you could work with a copy into an array using iterator_to_array:

foreach(iterator_to_array($dom->getElementsByTagName($tag)) as $node) {
    $node->parentNode->removeChild($node);
};  
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!