How can I strip html tags in C# [duplicate]

后端未结

关注

 3  1120

你的背包

相关标签:

3条回答

生来不讨喜

2021-01-04 03:24

  public static string StripHTML(string htmlString)
  {

     string pattern = @"<(.|\n)*?>";

     return Regex.Replace(htmlString, pattern, string.Empty);
  }

0 讨论(0)

日久生厌

2021-01-04 03:28

Take your HTML string or document and parse it with HTML Agility Pack. This will give you a HTMLDocument object that is very similar to a XmlDocument.

You can then use it's methods such as SelectNodes to access those portions of the document that you are interested in.

If you choose to use another approach, be aware that parsing HTML (a non-Regular language) with Regular Expressions is widely regarded as a bad idea.

And regardless of the approach, if you are keeping some markup, use a whitelist approach. This means to remove everything that is not explicitly wanted.

0 讨论(0)
发布评论:

提交评论
- 加载中...
深忆病人

2021-01-04 03:36

To guarantee that no HTML tags get through, use: HttpServerUtility.HtmlEncode(string);.

If you want some to get through, you can use this "Whitelist" approach.

Update: There has been some vulnerabilities found in that code; as a Developer from Fog Creek tells us.

(Second link includes code).

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题