How to replace all XHTML/HTML line breaks (
) with new lines?

前端 未结 4 1243
死守一世寂寞
死守一世寂寞 2020-11-30 04:50

I am looking for the best br2nl function. I would like to replace all instances of
and

相关标签:
4条回答
  • 2020-11-30 05:03

    From the nl2br comments:

    <?php
    function br2nl($string){
      $return=eregi_replace('<br[[:space:]]*/?'.
        '[[:space:]]*>',chr(13).chr(10),$string);
      return $return;
    }
    ?> 
    
    0 讨论(0)
  • 2020-11-30 05:04

    If the document is well-formed (or at least well-formed-ish) you can use the DOM extension and xpath to find and replace all br elements by a \n text node.

    $in = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">
    <html><head><title>...</title></head><body>abc<br />def<p>ghi<br />jkl</p></body></html>';
    
    $doc = new DOMDOcument;
    $doc->loadhtml($in);
    $xpath = new DOMXPath($doc);
    
    $toBeReplaced = array();
    foreach($xpath->query('//br') as $node) {
        $toBeReplaced[] = $node;
    }
    
    $linebreak = $doc->createTextNode("\n");
    foreach($toBeReplaced as $node) {
        $node->parentNode->replaceChild($linebreak->cloneNode(), $node);
    }
    
    echo $doc->savehtml();
    

    prints

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
    <html>
    <head><title>...</title></head>
    <body>abc
    def<p>ghi
    jkl</p>
    </body>
    </html>
    

    edit: shorter version with only one iteration

    $in = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">
    <html><head><title>...</title></head><body>abc<br />def<p>ghi<br />jkl</p></body></html>';
    
    $doc = new DOMDOcument;
    $doc->loadhtml($in);
    $xpath = new DOMXPath($doc);
    
    $linebreak = $doc->createTextNode("\n");
    foreach($xpath->query('//br') as $node) {
      $node->parentNode->removeChild($node);
    }
    
    echo $doc->savehtml();
    
    0 讨论(0)
  • 2020-11-30 05:10

    You should be using PHP_EOL constant to have platform independent newlines.

    In my opinion, using non-regexp functions whenever possible makes the code more readable.

    $newlineTags = array(
      '<br>',
      '<br/>',
      '<br />',
    );
    $html = str_replace($newlineTags, PHP_EOL, $html);
    

    I am aware this solution has some flaws, but wanted to share my insights still.

    0 讨论(0)
  • 2020-11-30 05:19

    I would generally say "don't use regex to work with HTML", but, on this one, I would probably go with a regex, considering that <br> tags generally look like either :

    • <br>
    • or <br/>, with any number of spaces before the /


    I suppose something like this would do the trick :

    $html = 'this <br>is<br/>some<br />text <br    />!';
    $nl = preg_replace('#<br\s*/?>#i', "\n", $html);
    echo $nl;
    

    Couple of notes :

    • starts with <br
    • followed by any number of white characters : \s*
    • optionnaly, a / : /?
    • and, finally, a >
    • and this using a case-insensitive match (#i), as <BR> would be valid in HTML
    0 讨论(0)
提交回复
热议问题