Strip HTML from Text JavaScript

前端未结

关注

 30  3555

北荒

Is there an easy way to take a string of html in JavaScript and strip out the html?

相关标签:

30条回答

别那么骄傲

2020-11-21 05:23

input element support only one line text:

The text state represents a one line plain text edit control for the element's value.

function stripHtml(str) {
  var tmp = document.createElement('input');
  tmp.value = str;
  return tmp.value;
}

Update: this works as expected

function stripHtml(str) {
  // Remove some tags
  str = str.replace(/<[^>]+>/gim, '');

  // Remove BB code
  str = str.replace(/\[(\w+)[^\]]*](.*?)\[\/\1]/g, '$2 ');

  // Remove html and line breaks
  const div = document.createElement('div');
  div.innerHTML = str;

  const input = document.createElement('input');
  input.value = div.textContent || div.innerText || '';

  return input.value;
}

0 讨论(0)

清酒与你

2020-11-21 05:24
With jQuery you can simply retrieving it by using
```
$('#elementID').text()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
情歌与酒

2020-11-21 05:24
The accepted answer works fine mostly, however in IE if the html string is null you get the "null" (instead of ''). Fixed:
```
function strip(html)
{
   if (html == null) return "";
   var tmp = document.createElement("DIV");
   tmp.innerHTML = html;
   return tmp.textContent || tmp.innerText || "";
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

忘了有多久

2020-11-21 05:26

I altered Jibberboy2000's answer to include several <BR /> tag formats, remove everything inside <SCRIPT> and <STYLE> tags, format the resulting HTML by removing multiple line breaks and spaces and convert some HTML-encoded code into normal. After some testing it appears that you can convert most of full web pages into simple text where page title and content are retained.

In the simple example,

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<!--comment-->

<head>

<title>This is my title</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style>

    body {margin-top: 15px;}
    a { color: #D80C1F; font-weight:bold; text-decoration:none; }

</style>
</head>

<body>
    <center>
        This string has <i>html</i> code i want to <b>remove</b><br>
        In this line <a href="http://www.bbc.co.uk">BBC</a> with link is mentioned.<br/>Now back to &quot;normal text&quot; and stuff using &lt;html encoding&gt;                 
    </center>
</body>
</html>

becomes

This is my title

This string has html code i want to remove

In this line BBC (http://www.bbc.co.uk) with link is mentioned.

Now back to "normal text" and stuff using

The JavaScript function and test page look this:

function convertHtmlToText() {
    var inputText = document.getElementById("input").value;
    var returnText = "" + inputText;

    //-- remove BR tags and replace them with line break
    returnText=returnText.replace(/<br>/gi, "\n");
    returnText=returnText.replace(/<br\s\/>/gi, "\n");
    returnText=returnText.replace(/<br\/>/gi, "\n");

    //-- remove P and A tags but preserve what's inside of them
    returnText=returnText.replace(/<p.*>/gi, "\n");
    returnText=returnText.replace(/<a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 ($1)");

    //-- remove all inside SCRIPT and STYLE tags
    returnText=returnText.replace(/<script.*>[\w\W]{1,}(.*?)[\w\W]{1,}<\/script>/gi, "");
    returnText=returnText.replace(/<style.*>[\w\W]{1,}(.*?)[\w\W]{1,}<\/style>/gi, "");
    //-- remove all else
    returnText=returnText.replace(/<(?:.|\s)*?>/g, "");

    //-- get rid of more than 2 multiple line breaks:
    returnText=returnText.replace(/(?:(?:\r\n|\r|\n)\s*){2,}/gim, "\n\n");

    //-- get rid of more than 2 spaces:
    returnText = returnText.replace(/ +(?= )/g,'');

    //-- get rid of html-encoded characters:
    returnText=returnText.replace(/&nbsp;/gi," ");
    returnText=returnText.replace(/&amp;/gi,"&");
    returnText=returnText.replace(/&quot;/gi,'"');
    returnText=returnText.replace(/&lt;/gi,'<');
    returnText=returnText.replace(/&gt;/gi,'>');

    //-- return
    document.getElementById("output").value = returnText;
}

It was used with this HTML:

<textarea id="input" style="width: 400px; height: 300px;"></textarea><br />
<button onclick="convertHtmlToText()">CONVERT</button><br />
<textarea id="output" style="width: 400px; height: 300px;"></textarea><br />

0 讨论(0)

礼貌的吻别

2020-11-21 05:27
```
myString.replace(/<[^>]*>?/gm, '');
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

無奈伤痛

2020-11-21 05:27

function stripHTML(my_string){
    var charArr   = my_string.split(''),
        resultArr = [],
        htmlZone  = 0,
        quoteZone = 0;
    for( x=0; x < charArr.length; x++ ){
     switch( charArr[x] + htmlZone + quoteZone ){
       case "<00" : htmlZone  = 1;break;
       case ">10" : htmlZone  = 0;resultArr.push(' ');break;
       case '"10' : quoteZone = 1;break;
       case "'10" : quoteZone = 2;break;
       case '"11' : 
       case "'12" : quoteZone = 0;break;
       default    : if(!htmlZone){ resultArr.push(charArr[x]); }
     }
    }
    return resultArr.join('');
}

Accounts for > inside attributes and <img onerror="javascript"> in newly created dom elements.

usage:

clean_string = stripHTML("string with <html> in it")

demo:

https://jsfiddle.net/gaby_de_wilde/pqayphzd/

demo of top answer doing the terrible things:

https://jsfiddle.net/gaby_de_wilde/6f0jymL6/1/

0 讨论(0)