Strip HTML from Text JavaScript

前端 未结 30 3555
北荒
北荒 2020-11-21 05:08

Is there an easy way to take a string of html in JavaScript and strip out the html?

相关标签:
30条回答
  • 2020-11-21 05:23

    input element support only one line text:

    The text state represents a one line plain text edit control for the element's value.

    function stripHtml(str) {
      var tmp = document.createElement('input');
      tmp.value = str;
      return tmp.value;
    }
    

    Update: this works as expected

    function stripHtml(str) {
      // Remove some tags
      str = str.replace(/<[^>]+>/gim, '');
    
      // Remove BB code
      str = str.replace(/\[(\w+)[^\]]*](.*?)\[\/\1]/g, '$2 ');
    
      // Remove html and line breaks
      const div = document.createElement('div');
      div.innerHTML = str;
    
      const input = document.createElement('input');
      input.value = div.textContent || div.innerText || '';
    
      return input.value;
    }
    
    0 讨论(0)
  • 2020-11-21 05:24

    With jQuery you can simply retrieving it by using

    $('#elementID').text()
    
    0 讨论(0)
  • 2020-11-21 05:24

    The accepted answer works fine mostly, however in IE if the html string is null you get the "null" (instead of ''). Fixed:

    function strip(html)
    {
       if (html == null) return "";
       var tmp = document.createElement("DIV");
       tmp.innerHTML = html;
       return tmp.textContent || tmp.innerText || "";
    }
    
    0 讨论(0)
  • 2020-11-21 05:26

    I altered Jibberboy2000's answer to include several <BR /> tag formats, remove everything inside <SCRIPT> and <STYLE> tags, format the resulting HTML by removing multiple line breaks and spaces and convert some HTML-encoded code into normal. After some testing it appears that you can convert most of full web pages into simple text where page title and content are retained.

    In the simple example,

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <html>
    <!--comment-->
    
    <head>
    
    <title>This is my title</title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    <style>
    
        body {margin-top: 15px;}
        a { color: #D80C1F; font-weight:bold; text-decoration:none; }
    
    </style>
    </head>
    
    <body>
        <center>
            This string has <i>html</i> code i want to <b>remove</b><br>
            In this line <a href="http://www.bbc.co.uk">BBC</a> with link is mentioned.<br/>Now back to &quot;normal text&quot; and stuff using &lt;html encoding&gt;                 
        </center>
    </body>
    </html>
    

    becomes

    This is my title

    This string has html code i want to remove

    In this line BBC (http://www.bbc.co.uk) with link is mentioned.

    Now back to "normal text" and stuff using

    The JavaScript function and test page look this:

    function convertHtmlToText() {
        var inputText = document.getElementById("input").value;
        var returnText = "" + inputText;
    
        //-- remove BR tags and replace them with line break
        returnText=returnText.replace(/<br>/gi, "\n");
        returnText=returnText.replace(/<br\s\/>/gi, "\n");
        returnText=returnText.replace(/<br\/>/gi, "\n");
    
        //-- remove P and A tags but preserve what's inside of them
        returnText=returnText.replace(/<p.*>/gi, "\n");
        returnText=returnText.replace(/<a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 ($1)");
    
        //-- remove all inside SCRIPT and STYLE tags
        returnText=returnText.replace(/<script.*>[\w\W]{1,}(.*?)[\w\W]{1,}<\/script>/gi, "");
        returnText=returnText.replace(/<style.*>[\w\W]{1,}(.*?)[\w\W]{1,}<\/style>/gi, "");
        //-- remove all else
        returnText=returnText.replace(/<(?:.|\s)*?>/g, "");
    
        //-- get rid of more than 2 multiple line breaks:
        returnText=returnText.replace(/(?:(?:\r\n|\r|\n)\s*){2,}/gim, "\n\n");
    
        //-- get rid of more than 2 spaces:
        returnText = returnText.replace(/ +(?= )/g,'');
    
        //-- get rid of html-encoded characters:
        returnText=returnText.replace(/&nbsp;/gi," ");
        returnText=returnText.replace(/&amp;/gi,"&");
        returnText=returnText.replace(/&quot;/gi,'"');
        returnText=returnText.replace(/&lt;/gi,'<');
        returnText=returnText.replace(/&gt;/gi,'>');
    
        //-- return
        document.getElementById("output").value = returnText;
    }
    

    It was used with this HTML:

    <textarea id="input" style="width: 400px; height: 300px;"></textarea><br />
    <button onclick="convertHtmlToText()">CONVERT</button><br />
    <textarea id="output" style="width: 400px; height: 300px;"></textarea><br />
    
    0 讨论(0)
  • 2020-11-21 05:27
    myString.replace(/<[^>]*>?/gm, '');
    
    0 讨论(0)
  • 2020-11-21 05:27
    function stripHTML(my_string){
        var charArr   = my_string.split(''),
            resultArr = [],
            htmlZone  = 0,
            quoteZone = 0;
        for( x=0; x < charArr.length; x++ ){
         switch( charArr[x] + htmlZone + quoteZone ){
           case "<00" : htmlZone  = 1;break;
           case ">10" : htmlZone  = 0;resultArr.push(' ');break;
           case '"10' : quoteZone = 1;break;
           case "'10" : quoteZone = 2;break;
           case '"11' : 
           case "'12" : quoteZone = 0;break;
           default    : if(!htmlZone){ resultArr.push(charArr[x]); }
         }
        }
        return resultArr.join('');
    }
    

    Accounts for > inside attributes and <img onerror="javascript"> in newly created dom elements.

    usage:

    clean_string = stripHTML("string with <html> in it")
    

    demo:

    https://jsfiddle.net/gaby_de_wilde/pqayphzd/

    demo of top answer doing the terrible things:

    https://jsfiddle.net/gaby_de_wilde/6f0jymL6/1/

    0 讨论(0)
提交回复
热议问题