Perhaps the source code of this will help - They're using a regex to strip out the unnecessary strings:
http://beerpla.net/projects/smartdomdocument-a-smarter-php-domdocument-class/
$content = preg_replace(array("/^\<\!DOCTYPE.*?<html><body>/si",
"!</body></html>$!si"),
"",
$this->saveHTML());
return $content;
saveHTMLExact() - DOMDocument has an extremely badly designed "feature" where if the HTML code you are loading does not contain <html>
and <body>
tags, it adds them automatically (yup, there are no flags to turn this behavior off).
Thus, when you call $doc->saveHTML(), your newly saved content now has <html><body>
and DOCTYPE
in it. Not very handy when trying to work with code fragments (XML has a similar problem).
SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want – it saves HTML without adding that extra garbage that DOMDocument does.
Also, other questions have asked similar things:
How to saveHTML of DOMDocument without HTML wrapper?
Try using DOMDocument->saveXML()?
<?php
$html = '<p><a href="test.php">Test</a></p>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$domnodelist = $doc->getElementsByTagName('p');
$domnode = $domnodelist->item(0);
echo $doc->saveXML($domnode);
?>
It outputs <p><a href="test.php">Test</a></p>
Thanks but I won't necessarily know the type of the first tag in the body, it needs to be generic
$domnodelist = $doc->getElementsByTagName('*');
$domnode = $domnodelist->item(0);
echo $doc->saveXML($domnode);