I must cleanup some HTML code to remove <style>
and <link>
tags inside the <body>
tag.
I'm already using PHP Tidy to do some cleanup but I did not found how to remove those tags with PHP Tidy.
Do you have a solution ? Or maybe another markup cleaner PHP class...
Don't know how to do that with Tidy, but you can use DOM
$dom = new DOMDocument; // init new DOMDocument
$dom->loadHTML($html); // load HTML into it
$xpath = new DOMXPath($dom); // create a new XPath
$nodes = $xpath->query('//body/style'); // Find all style elements in body tag
foreach($nodes as $node) { // Iterate over found elements
$node->parentNode->removeChild($node); // Remove complete style node
}
echo $dom->saveHTML(); // output cleaned HTML
For the <link>
elements, adjust the Xpath to //body/link
.
An alternative to Tidy would be http://htmlpurifier.org/
HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.
Made this an additional answer, since it is so completely unrelated to the DOM solution.
来源:https://stackoverflow.com/questions/3053349/php-tidy-remove-link-and-style-tags-inside-body