Strip HTML from Text JavaScript

前端未结

关注

 30  3568

北荒

Is there an easy way to take a string of html in JavaScript and strip out the html?

相关标签:

30条回答

梦如初夏

2020-11-21 05:37

If you want to keep the links and the structure of the content (h1, h2, etc) then you should check out TextVersionJS You can use it with any HTML, although it was created to convert an HTML email to plain text.

The usage is very simple. For example in node.js:

var createTextVersion = require("textversionjs");
var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";

var textVersion = createTextVersion(yourHtml);

Or in the browser with pure js:

<script src="textversion.js"></script>
<script>
  var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
  var textVersion = createTextVersion(yourHtml);
</script>

It also works with require.js:

define(["textversionjs"], function(createTextVersion) {
  var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
  var textVersion = createTextVersion(yourHtml);
});

0 讨论(0)

一生所求

2020-11-21 05:38
I would like to share an edited version of the Shog9's approved answer.

As Mike Samuel pointed with a comment, that function can execute inline javascript codes.
But Shog9 is right when saying "let the browser do it for you..."

so.. here my edited version, using DOMParser:
```
function strip(html){
   let doc = new DOMParser().parseFromString(html, 'text/html');
   return doc.body.textContent || "";
}
```
here the code to test the inline javascript:
```
strip("<img onerror='alert(\"could run arbitrary JS here\")' src=bogus>")
```
Also, it does not request resources on parse (like images)
```
strip("Just text <img src='https://assets.rbl.ms/4155638/980x.jpg'>")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
不要未来只要你来

2020-11-21 05:38
An improvement to the accepted answer.
```
function strip(html)
{
   var tmp = document.implementation.createHTMLDocument("New").body;
   tmp.innerHTML = html;
   return tmp.textContent || tmp.innerText || "";
}
```
This way something running like this will do no harm:
```
strip("<img onerror='alert(\"could run arbitrary JS here\")' src=bogus>")
```
Firefox, Chromium and Explorer 9+ are safe. Opera Presto is still vulnerable. Also images mentioned in the strings are not downloaded in Chromium and Firefox saving http requests.
0 讨论(0)
发布评论:

提交评论
- 加载中...
野性不改

2020-11-21 05:38
```
var text = html.replace(/<\/?("[^"]*"|'[^']*'|[^>])*(>|$)/g, "");
```
This is a regex version, which is more resilient to malformed HTML, like:

Unclosed tags

Some text <img

"<", ">" inside tag attributes

Some text <img alt="x > y">

Newlines

Some <a href="http://google.com">

The code
```
var html = '<br>This <img alt="a>b" \r\n src="a_b.gif" />is > \nmy<>< > <a>"text"</a'
var text = html.replace(/<\/?("[^"]*"|'[^']*'|[^>])*(>|$)/g, "");
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

死守一世寂寞

2020-11-21 05:40

Another, admittedly less elegant solution than nickf's or Shog9's, would be to recursively walk the DOM starting at the <body> tag and append each text node.

var bodyContent = document.getElementsByTagName('body')[0];
var result = appendTextNodes(bodyContent);

function appendTextNodes(element) {
    var text = '';

    // Loop through the childNodes of the passed in element
    for (var i = 0, len = element.childNodes.length; i < len; i++) {
        // Get a reference to the current child
        var node = element.childNodes[i];
        // Append the node's value if it's a text node
        if (node.nodeType == 3) {
            text += node.nodeValue;
        }
        // Recurse through the node's children, if there are any
        if (node.childNodes.length > 0) {
            appendTextNodes(node);
        }
    }
    // Return the final result
    return text;
}

0 讨论(0)

北海茫月

2020-11-21 05:42
As an extension to the jQuery method, if your string might not contain HTML (eg if you are trying to remove HTML from a form field)
```
jQuery(html).text();
```
will return an empty string if there is no HTML

Use:
```
jQuery('<p>' + html + '</p>').text();
```
instead.

Update: As has been pointed out in the comments, in some circumstances this solution will execute javascript contained within html if the value of html could be influenced by an attacker, use a different solution.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3 4 5