DOM parser that allows HTML5-style </ in [removed] tag

后端 未结 6 1180
無奈伤痛
無奈伤痛 2020-11-28 06:14

Update: html5lib (bottom of question) seems to get close, I just need to improve my understanding of how it\'s used.

I am attempting to

相关标签:
6条回答
  • 2020-11-28 06:59

    I added comment tags (<!-- ... -->) in my jQuery template blocks (CDATA blocks also failed) and DOMDocument did not touch the internal HTML.

    Then, before I used the jQuery templates, I wrote a script to remove the comments.

    $(function() {
        $('script[type="text/x-jquery-tmpl"]').text(function() {
            // The comment node in this context is actually a text node.
            return $.trim($(this).text()).replace(/^<!--([\s\S]*)-->$/, '$1');
        });
    });
    

    Not ideal, but I wasn't sure of a better workaround.

    0 讨论(0)
  • 2020-11-28 07:00

    FluentDOM uses the DOMDocument but blocks loading notices and warnings. It does not have an own parser. You can add your own loaders (For example one that uses the html5lib).

    0 讨论(0)
  • 2020-11-28 07:11

    I had the same problem and apparently you can hack your way trough this by loading the document as XML, and save it as HTML :)

    $d = new DOMDocument;
    $d->loadXML('<script id="foo"><td>bar</td></script>');
    echo $d->saveHTML();
    

    But of course the markup must be error-free for loadXML to work.

    0 讨论(0)
  • 2020-11-28 07:13

    I just find out (in my case).

    try to change parameters option of loadHTML using LIBXML_SCHEMA_CREATE in DOMDocument

    $dom = new DOMDocument;
    
    libxml_use_internal_errors(true);
    //$dom->loadHTML($buffer, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    $dom->loadHTML($buffer, LIBXML_SCHEMA_CREATE);
    
    0 讨论(0)
  • 2020-11-28 07:17

    I ran into this exact problem.

    PHP Dom Document parses the html inside a script tag and that can actually lead to a completely different dom.

    Since I didn't want to use another library than DomDocument. I wrote a few lines that strips any script content, then you do what ever you need to do with dom document and afterwards you put that script content back.

    Obviously the script content isn't available to your dom object because it's empty.

    With the following lines of php code you can 'fix' this problem. Be warned that script tags in scripts tags will cause bug.

    $scripts = array();
    // this will select all script tags non-greedy. If you have a script tag in your script tag, it will cause problems.
    preg_match_all("/((<script.*>)(.*))\/script>/sU", $html, $scripts);
    // Make content of scripts empty
    $html = str_replace($scripts[3], '', $html);
    
    // Do DOM Document stuff here
    
    // Put script contents back
    $html = str_replace($scripts[2], $scripts[1], $html);
    

    I hope this will help some people :-).

    0 讨论(0)
  • 2020-11-28 07:21

    Re: html5lib

    You click on the download tab and download the PHP version of the parser.

    You untar the archive in a local folder

     tar -zxvf html5lib-php-0.1.tar.gz
     x html5lib-php-0.1/
     x html5lib-php-0.1/VERSION
     x html5lib-php-0.1/docs/
     ... etc
    

    You change directories and create a file named hello.php

    cd html5lib-php-0.1
    touch hello.php 
    

    You place the following PHP code in hello.php

    $html = '<html><head></head><body>
    <script type="text/x-jquery-tmpl" id="foo">
    <table><tr><td>${name}</td></tr></table>
    </script> 
    </body></html>';
    $dom = HTML5_Parser::parse($html); 
    var_dump($dom->saveXml()); 
    echo "\nDone\n";
    

    You run hello.php from the command line

    php hello.php
    

    The parser will parse the document tree, and return a DOMDocument object, which can be manipulated as any other DOMDocument object.

    0 讨论(0)
提交回复
热议问题