Strip HTML from Text JavaScript

前端未结

关注

 30  3567

北荒

Is there an easy way to take a string of html in JavaScript and strip out the html?

相关标签:

30条回答

抹茶落季

2020-11-21 05:27
Here's a version which sorta addresses @MikeSamuel's security concern:
```
function strip(html)
{
   try {
       var doc = document.implementation.createDocument('http://www.w3.org/1999/xhtml', 'html', null);
       doc.documentElement.innerHTML = html;
       return doc.documentElement.textContent||doc.documentElement.innerText;
   } catch(e) {
       return "";
   }
}
```
Note, it will return an empty string if the HTML markup isn't valid XML (aka, tags must be closed and attributes must be quoted). This isn't ideal, but does avoid the issue of having the security exploit potential.

If not having valid XML markup is a requirement for you, you could try using:
```
var doc = document.implementation.createHTMLDocument("");
```
but that isn't a perfect solution either for other reasons.
0 讨论(0)
发布评论:

提交评论
- 加载中...
一向

2020-11-21 05:29
Converting HTML for Plain Text emailing keeping hyperlinks (a href) intact

The above function posted by hypoxide works fine, but I was after something that would basically convert HTML created in a Web RichText editor (for example FCKEditor) and clear out all HTML but leave all the Links due the fact that I wanted both the HTML and the plain text version to aid creating the correct parts to an STMP email (both HTML and plain text).

After a long time of searching Google myself and my collegues came up with this using the regex engine in Javascript:
```
str='this string has html code i want to remove Link Number 1 -><a href="http://www.bbc.co.uk">BBC</a> Link Number 1 Now back to normal text and stuff
';
str=str.replace(/ /gi, "\n");
str=str.replace(/<p.*>/gi, "\n");
str=str.replace(/<a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 (Link->$1) ");
str=str.replace(/<(?:.|\s)*?>/g, "");
```
the str variable starts out like this:
```
this string has html code i want to remove Link Number 1 -><a href="http://www.bbc.co.uk">BBC</a> Link Number 1 Now back to normal text and stuff
```
and then after the code has run it looks like this:-
```
this string has html code i want to remove
Link Number 1 -> BBC (Link->http://www.bbc.co.uk) Link Number 1


Now back to normal text and stuff
```
As you can see the all the HTML has been removed and the Link have been persevered with the hyperlinked text is still intact. Also I have replaced the  and   tags with \n (newline char) so that some sort of visual formatting has been retained.

To change the link format (eg. BBC (Link->http://www.bbc.co.uk) ) just edit the $2 (Link->$1), where $1 is the href URL/URI and the $2 is the hyperlinked text. With the links directly in body of the plain text most SMTP Mail Clients convert these so the user has the ability to click on them.

Hope you find this useful.
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人共我

2020-11-21 05:31

After trying all of the answers mentioned most if not all of them had edge cases and couldn't completely support my needs.

I started exploring how php does it and came across the php.js lib which replicates the strip_tags method here: http://phpjs.org/functions/strip_tags/

0 讨论(0)
发布评论:

提交评论
- 加载中...

情歌与酒

2020-11-21 05:32

A lot of people have answered this already, but I thought it might be useful to share the function I wrote that strips HTML tags from a string but allows you to include an array of tags that you do not want stripped. It's pretty short and has been working nicely for me.

function removeTags(string, array){
  return array ? string.split("<").filter(function(val){ return f(array, val); }).map(function(val){ return f(array, val); }).join("") : string.split("<").map(function(d){ return d.split(">").pop(); }).join("");
  function f(array, value){
    return array.map(function(d){ return value.includes(d + ">"); }).indexOf(true) != -1 ? "<" + value : value.split(">")[1];
  }
}

var x = "<span><i>Hello</i> <b>world</b>!</span>";
console.log(removeTags(x)); // Hello world!
console.log(removeTags(x, ["span", "i"])); // <span><i>Hello</i> world!</span>

0 讨论(0)

隐瞒了意图╮

2020-11-21 05:33
I just needed to strip out the <a> tags and replace them with the text of the link.

This seems to work great.
```
htmlContent= htmlContent.replace(/<a.*href="(.*?)">/g, '');
htmlContent= htmlContent.replace(/<\/a>/g, '');
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

长情又很酷

2020-11-21 05:33

I have created a working regular expression myself:

str=str.replace(/(<\?[a-z]*(\s[^>]*)?\?(>|$)|<!\[[a-z]*\[|\]\]>|<!DOCTYPE[^>]*?(>|$)|<!--[\s\S]*?(-->|$)|<[a-z?!\/]([a-z0-9_:.])*(\s[^>]*)?(>|$))/gi, '');

0 讨论(0)

Strip HTML from Text JavaScript

Converting HTML for Plain Text emailing keeping hyperlinks (a href) intact