I am looking for the best br2nl
function. I would like to replace all instances of
and
From the nl2br comments:
<?php
function br2nl($string){
$return=eregi_replace('<br[[:space:]]*/?'.
'[[:space:]]*>',chr(13).chr(10),$string);
return $return;
}
?>
If the document is well-formed (or at least well-formed-ish) you can use the DOM extension and xpath to find and replace all br elements by a \n text node.
$in = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html><head><title>...</title></head><body>abc<br />def<p>ghi<br />jkl</p></body></html>';
$doc = new DOMDOcument;
$doc->loadhtml($in);
$xpath = new DOMXPath($doc);
$toBeReplaced = array();
foreach($xpath->query('//br') as $node) {
$toBeReplaced[] = $node;
}
$linebreak = $doc->createTextNode("\n");
foreach($toBeReplaced as $node) {
$node->parentNode->replaceChild($linebreak->cloneNode(), $node);
}
echo $doc->savehtml();
prints
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head><title>...</title></head>
<body>abc
def<p>ghi
jkl</p>
</body>
</html>
edit: shorter version with only one iteration
$in = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html><head><title>...</title></head><body>abc<br />def<p>ghi<br />jkl</p></body></html>';
$doc = new DOMDOcument;
$doc->loadhtml($in);
$xpath = new DOMXPath($doc);
$linebreak = $doc->createTextNode("\n");
foreach($xpath->query('//br') as $node) {
$node->parentNode->removeChild($node);
}
echo $doc->savehtml();
You should be using PHP_EOL
constant to have platform independent newlines.
In my opinion, using non-regexp functions whenever possible makes the code more readable.
$newlineTags = array(
'<br>',
'<br/>',
'<br />',
);
$html = str_replace($newlineTags, PHP_EOL, $html);
I am aware this solution has some flaws, but wanted to share my insights still.
I would generally say "don't use regex to work with HTML", but, on this one, I would probably go with a regex, considering that <br>
tags generally look like either :
<br>
<br/>
, with any number of spaces before the /
I suppose something like this would do the trick :
$html = 'this <br>is<br/>some<br />text <br />!';
$nl = preg_replace('#<br\s*/?>#i', "\n", $html);
echo $nl;
Couple of notes :
<br
\s*
/
: /?
>
#i
), as <BR>
would be valid in HTML