How to close unclosed HTML Tags?

后端 未结 8 554
名媛妹妹
名媛妹妹 2020-11-30 01:56

Whenever we are fetching some user inputed content with some editing from the database or similar sources, we might retrieve the portion which only contains the opening tag

相关标签:
8条回答
  • 2020-11-30 02:33

    I have solution for php

    <?php
        // close opened html tags
        function closetags ( $html )
            {
            #put all opened tags into an array
            preg_match_all ( "#<([a-z]+)( .*)?(?!/)>#iU", $html, $result );
            $openedtags = $result[1];
    
            #put all closed tags into an array
            preg_match_all ( "#</([a-z]+)>#iU", $html, $result );
            $closedtags = $result[1];
            $len_opened = count ( $openedtags );
    
            # all tags are closed
            if( count ( $closedtags ) == $len_opened )
            {
                return $html;
            }
            $openedtags = array_reverse ( $openedtags );
    
            # close tags
            for( $i = 0; $i < $len_opened; $i++ )
            {
                if ( !in_array ( $openedtags[$i], $closedtags ) )
                {
                    $html .= "</" . $openedtags[$i] . ">";
                }
                else
                {
                    unset ( $closedtags[array_search ( $openedtags[$i], $closedtags)] );
                }
            }
            return $html;
        }
        // close opened html tags
    ?>
    

    You can use this function like

       <?php echo closetags("your content <p>test test"); ?>
    
    0 讨论(0)
  • 2020-11-30 02:37

    In addition to server-side tools like Tidy, you can also use the user's browser to do some of the cleanup for you. One of the really great things about innerHTML is that it will apply the same on-the-fly repair to dynamic content as it does to HTML pages. This code works pretty well (with two caveats) and nothing actually gets written to the page:

    var divTemp = document.createElement('div');
    divTemp.innerHTML = '<p id="myPara">these <i>tags aren\'t <strong> closed';
    console.log(divTemp.innerHTML); 
    

    The caveats:

    1. The different browsers will return different strings. This isn't so bad, except in the the case of IE, which will return capitalized tags and will strip the quotes from tag attributes, which will not pass validation. The solution here is to do some simple clean-up on the server side. But at least the document will be properly structured XML.

    2. I suspect that you may have to put in a delay before reading the innerHTML -- give the browser a chance to digest the string -- or you risk getting back exactly what was put in. I just tried on IE8 and it looks like the string gets parsed immediately, but I'm not so sure on IE6. It would probably be best to read the innerHTML after a delay (or throw it into a setTimeout() to force it to the end of the queue).

    I would recommend you take @Gordon's advice and use Tidy if you have access to it (it takes less work to implement) and failing that, use innerHTML and write your own tidy function in PHP.

    And though this isn't part of your question, as this is for a CMS, consider also using the YUI 2 Rich Text Editor for stuff like this. It's fairly easy to implement, somewhat easy to customize, the interface is very familiar to most users, and it spits out perfectly valid code. There are several other off-the-shelf rich text editors out there, but YUI has the best license and is the most powerful I've seen.

    0 讨论(0)
提交回复
热议问题