Regex was my original idea as a solution, although it soon became apparent a DOM parser would be more appropriate... I\'d like to convert spaces to
b
I see the short coming of my previous answer. Here is a workaround to preserve tags inside the tag:
loadHTML($test);
$xpath = new DOMXpath($dom);
$pre = $xpath->query('//pre//text()');
// manipulate nodes of type XML_TEXT_NODE
foreach($pre as $e) {
$e->nodeValue = str_replace(' ', '__REPLACEMELATER__', $e->nodeValue);
// when you attempt to write in a dom node
// the & will be converted to & :(
}
$temp = $dom->saveHTML();
$temp = str_replace('', '', $temp);
$temp = str_replace('', '', $temp);
$temp = str_replace('', '', $temp);
$temp = str_replace('', '', $temp);
$temp = str_replace('', '', $temp);
$temp = str_replace('__REPLACEMELATER__', ' ', $temp);
echo $temp;
?>
paragraph 1 remains untouched
preformatted 1
preformatted 2
preformatted 3 span text preformatted 3
preformatted 4 span bold test text preformatted 3
paragraph 1 remains untouched
preformatted 1
preformatted 2
preformatted 3 span text preformatted 3
preformatted 4 span bold test text preformatted 3
DOMDocument::saveHTML() method in PHP >= 5.3.6 allows you to specify the node to output. Otherwise you can use str_replace()
or preg_replace()
to elimitate doctype, html and body tags.
This trick seems to work and results in one less line of code but I am not sure if it is guaranteed to work:
$e->nodeValue = utf8_encode(str_replace(' ', "\xA0", $e->nodeValue));
// dom library will attempt to convert 0xA0 to
// nodeValue expects utf-8 encoded data but 0xA0 is not valid in this encoding
// hence replaced string must be utf-8 encoded