I\'m trying to parse a document and get all the image tags and change the source for something different.
$domDocument = new DOMDocument();
$domDo
you can use http://beerpla.net/projects/smartdomdocument-a-smarter-php-domdocument-class/ :
DOMDocument has an extremely badly designed "feature" where if the HTML code you are loading does not contain and tags, it adds them automatically (yup, there are no flags to turn this behavior off).
Thus, when you call $doc->saveHTML(), your newly saved content now has and DOCTYPE in it. Not very handy when trying to work with code fragments (XML has a similar problem).
SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want – it saves HTML without adding that extra garbage that DOMDocument does.
You just need to add 2 flags to the loadHTML()
method: LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD
. I.e.
$domDocument->loadHTML($text, LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD);
See IDEONE demo:
$text = '<p>Hi, this is a test, here is an image<img src="http://example.com/beer.jpg" width="60" height="95" /> Because I like Beer!</p>';
$domDocument = new DOMDocument;
$domDocument->loadHTML($text, LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD);
$imageNodeList = $domDocument->getElementsByTagName('img');
foreach ($imageNodeList as $Image) {
$Image->setAttribute('src', 'lalala');
$domDocument->saveHTML($Image);
}
$text = $domDocument->saveHTML();
echo $text;
Output:
<p>Hi, this is a test, here is an image<img src="lalala" width="60" height="95"> Because I like Beer!</p>
DomDocument is unfortunately retarded and won't let you do this. Try this:
$text = preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $domDocument->saveHTML()));
If you are up to a hack, this is the way I managed to go around this annoyance. Load the string as XML and save it as HTML. :)
If you're going to save as HTML, you have to expect a valid HTML document to be created!
There is another option: DOMDocument::saveXML has an optional parameter allowing you to access the XML content of a particular element:
$el = $domDocument->getElementsByTagName('p')->item(0);
$text = $domDocument->saveXML($el);
This presumes that your content only has one p
element.