PHP “pretty print” HTML (not Tidy)

后端 未结 3 443
耶瑟儿~
耶瑟儿~ 2020-11-29 04:53

I\'m using the DOM extension in PHP to build some HTML documents, and I want the output to be formatted nicely (with new lines and indentation) so that it\'s readable, howev

相关标签:
3条回答
  • 2020-11-29 05:11

    you're right, there seems to be no indentation for HTML (others are also confused). XML works, even with loaded code.

    <?php
    function tidyHTML($buffer) {
        // load our document into a DOM object
        $dom = new DOMDocument();
        // we want nice output
        $dom->preserveWhiteSpace = false;
        $dom->loadHTML($buffer);
        $dom->formatOutput = true;
        return($dom->saveHTML());
    }
    
    // start output buffering, using our nice
    // callback function to format the output.
    ob_start("tidyHTML");
    
    ?>
    <html>
        <head>
        <title>foo bar</title><meta name="bar" value="foo"><body><h1>bar foo</h1><p>It's like comparing apples to oranges.</p></body></html>
    <?php
    // this will be called implicitly, but we'll
    // call it manually to illustrate the point.
    ob_end_flush();
    ?>
    

    result:

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html>
    <head>
    <title>foo bar</title>
    <meta name="bar" value="foo">
    </head>
    <body>
    <h1>bar foo</h1>
    <p>It's like comparing apples to oranges.</p>
    </body>
    </html>
    

    the same with saveXML() ...

    <?xml version="1.0" standalone="yes"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html>
      <head>
        <title>foo bar</title>
        <meta name="bar" value="foo"/>
      </head>
      <body>
        <h1>bar foo</h1>
        <p>It's like comparing apples to oranges.</p>
      </body>
    </html>
    

    probably forgot to set preserveWhiteSpace=false before loadHTML?

    disclaimer: i stole most of the demo code from tyson clugg/php manual comments. lazy me.


    UPDATE: i now remember some years ago i tried the same thing and ran into the same problem. i fixed this by applying a dirty workaround (wasn't performance critical): i just somehow converted around between SimpleXML and DOM until the problem vanished. i suppose the conversion got rid of those nodes. maybe load with dom, import with simplexml_import_dom, then output the string, parse this with DOM again and then printed it pretty. as far as i remember this worked (but it was really slow).

    0 讨论(0)
  • 2020-11-29 05:19

    The result:

    <!DOCTYPE html>
    <html>
        <head>
            <title>My website</title>
        </head>
    </html>
    

    Please consider:

    function indentContent($content, $tab="\t"){
        $content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content); // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
        $token = strtok($content, "\n"); // now indent the tags
        $result = ''; // holds formatted version as it is built
        $pad = 0; // initial indent
        $matches = array(); // returns from preg_matches()
        // scan each line and adjust indent based on opening/closing tags
        while ($token !== false && strlen($token)>0){
            $padPrev = $padPrev ?: $pad; // previous padding //Artis
            $token = trim($token);
            // test for the various tag states
            if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)){// 1. open and closing tags on same line - no change
                $indent=0;
            }elseif(preg_match('/^<\/\w/', $token, $matches)){// 2. closing tag - outdent now
                $pad--;
                if($indent>0) $indent=0;
            }elseif(preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)){// 3. opening tag - don't pad this one, only subsequent tags (only if it isn't a void tag)
                foreach($matches as $m){
                    if (preg_match('/^<(area|base|br|col|command|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)/im', $m)){// Void elements according to http://www.htmlandcsswebdesign.com/articles/voidel.php
                        $voidTag=true;
                        break;
                    }
                }
                $indent = 1;
            }else{// 4. no indentation needed
                $indent = 0;
            }
    
    
            $line = str_pad($token, strlen($token)+$pad, $tab, STR_PAD_LEFT);// pad the line with the required number of leading spaces
            $result .= $line."\n"; // add to the cumulative result, with linefeed
            $token = strtok("\n"); // get the next token
            $pad += $indent; // update the pad size for subsequent lines
            if($voidTag){
                $voidTag=false;
                $pad--;
            }
        }
        return $result;
    }
    
    //$htmldoc - DOMdocument Object!
    
    $niceHTMLwithTABS = indentContent($htmldoc->saveHTML(), $tab="\t");
    
    echo $niceHTMLwithTABS;
    

    Will result in HTML that has:

    • Indentation based on "levels"
    • Line breaks after block level elements
    • While inline and self-closing elements are not affected

    The function (which is a method for class I use) is largely based on: https://stackoverflow.com/a/7840997/7646824

    0 讨论(0)
  • 2020-11-29 05:21

    You can use the code for the hl_tidy function of the htmLawed library.

    // indent using one tab per indent, with all HTML being within an imaginary div
    $out = hl_tidy($in, 't', 'div')
    
    0 讨论(0)
提交回复
热议问题