How to remove only html tags in a string using javascript

后端 未结 6 1322
臣服心动
臣服心动 2021-02-02 03:49

I want to remove html tags from given string using javascript. I looked into current approaches but there are some unsolved problems occured with them.

Current solutions

相关标签:
6条回答
  • 2021-02-02 04:00

    Using a regex might not be a problem if you consider a different approach. For instance, looking for all tags, and then checking to see if the tag name matches a list of defined, valid HTML tag names:

    var protos = document.body.constructor === window.HTMLBodyElement;
        validHTMLTags  =/^(?:a|abbr|acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|bgsound|big|blink|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|data|datalist|dd|del|details|dfn|dir|div|dl|dt|em|embed|fieldset|figcaption|figure|font|footer|form|frame|frameset|h1|h2|h3|h4|h5|h6|head|header|hgroup|hr|html|i|iframe|img|input|ins|isindex|kbd|keygen|label|legend|li|link|listing|main|map|mark|marquee|menu|menuitem|meta|meter|nav|nobr|noframes|noscript|object|ol|optgroup|option|output|p|param|plaintext|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|source|spacer|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr|xmp)$/i;
    
    function sanitize(txt) {
        var // This regex normalises anything between quotes
            normaliseQuotes = /=(["'])(?=[^\1]*[<>])[^\1]*\1/g,
            normaliseFn = function ($0, q, sym) { 
                return $0.replace(/</g, '&lt;').replace(/>/g, '&gt;'); 
            },
            replaceInvalid = function ($0, tag, off, txt) {
                var 
                    // Is it a valid tag?
                    invalidTag = protos && 
                        document.createElement(tag) instanceof HTMLUnknownElement
                        || !validHTMLTags.test(tag),
    
                    // Is the tag complete?
                    isComplete = txt.slice(off+1).search(/^[^<]+>/) > -1;
    
                return invalidTag || !isComplete ? '&lt;' + tag : $0;
            };
    
        txt = txt.replace(normaliseQuotes, normaliseFn)
                 .replace(/<(\w+)/g, replaceInvalid);
    
        var tmp = document.createElement("DIV");
        tmp.innerHTML = txt;
    
        return "textContent" in tmp ? tmp.textContent : tmp.innerHTML;
    }
    

    Working Demo: http://jsfiddle.net/m9vZg/3/

    This works because browsers parse '>' as text if it isn't part of a matching '<' opening tag. It doesn't suffer the same problems as trying to parse HTML tags using a regular expression, because you're only looking for the opening delimiter and the tag name, everything else is irrelevant.

    It's also future proof: the WebIDL specification tells vendors how to implement prototypes for HTML elements, so we try and create a HTML element from the current matching tag. If the element is an instance of HTMLUnknownElement, we know that it's not a valid HTML tag. The validHTMLTags regular expression defines a list of HTML tags for older browsers, such as IE 6 and 7, that do not implement these prototypes.

    0 讨论(0)
  • 2021-02-02 04:06

    Here is my solution ,

    function removeTags(){
        var txt = document.getElementById('myString').value;
        var rex = /(<([^>]+)>)/ig;
        alert(txt.replace(rex , ""));
    
    }
    
    0 讨论(0)
  • 2021-02-02 04:11

    If you want to keep invalid markup untouched, regular expressions is your best bet. Something like this might work:

     text = html.replace(/<\/?(span|div|img|p...)\b[^<>]*>/g, "")
    

    Expand (span|div|img|p...) into a list of all tags (or only those you want to remove). NB: the list must be sorted by length, longer tags first!

    This may provide incorrect results in some edge cases (like attributes with <> characters), but the only real alternative would be to program a complete html parser by yourself. Not that it would be extremely complicated, but might be an overkill here. Let us know.

    0 讨论(0)
  • 2021-02-02 04:13
    var StrippedString = OriginalString.replace(/(<([^>]+)>)/ig,"");
    
    0 讨论(0)
  • 2021-02-02 04:13

    I use regular expression for preventing HTML tags in my textarea

    Example

    <form>
        <textarea class="box"></textarea>
        <button>Submit</button>
    </form>
    <script>
        $(".box").focusout( function(e) {
            var reg =/<(.|\n)*?>/g; 
            if (reg.test($('.box').val()) == true) {
                alert('HTML Tag are not allowed');
            }
            e.preventDefault();
        });
    </script>
    
    0 讨论(0)
  • 2021-02-02 04:18
    <script type="text/javascript">
    function removeHTMLTags() {           
    var str="<html><p>I want to remove HTML tags</p></html>";
    alert(str.replace(/<[^>]+>/g, ''));
        }</script>
    
    0 讨论(0)
提交回复
热议问题