Using C# regular expressions to remove HTML tags

前端 未结 10 1708
悲&欢浪女
悲&欢浪女 2020-11-22 05:59

How do I use C# regular expression to replace/remove all HTML tags, including the angle brackets? Can someone please help me with the code?

相关标签:
10条回答
  • 2020-11-22 06:41

    @JasonTrue is correct, that stripping HTML tags should not be done via regular expressions.

    It's quite simple to strip HTML tags using HtmlAgilityPack:

    public string StripTags(string input) {
        var doc = new HtmlDocument();
        doc.LoadHtml(input ?? "");
        return doc.DocumentNode.InnerText;
    }
    
    0 讨论(0)
  • Add .+? in <[^>]*> and try this regex (base on this):

    <[^>].+?>
    

    c# .net regex demo

    0 讨论(0)
  • 2020-11-22 06:46

    use this..

    @"(?></?\w+)(?>(?:[^>'""]+|'[^']*'|""[^""]*"")*)>"
    
    0 讨论(0)
  • 2020-11-22 06:50

    As often stated before, you should not use regular expressions to process XML or HTML documents. They do not perform very well with HTML and XML documents, because there is no way to express nested structures in a general way.

    You could use the following.

    String result = Regex.Replace(htmlDocument, @"<[^>]*>", String.Empty);
    

    This will work for most cases, but there will be cases (for example CDATA containing angle brackets) where this will not work as expected.

    0 讨论(0)
  • 2020-11-22 06:50

    try regular expression method at this URL: http://www.dotnetperls.com/remove-html-tags

    /// <summary>
    /// Remove HTML from string with Regex.
    /// </summary>
    public static string StripTagsRegex(string source)
    {
    return Regex.Replace(source, "<.*?>", string.Empty);
    }
    
    /// <summary>
    /// Compiled regular expression for performance.
    /// </summary>
    static Regex _htmlRegex = new Regex("<.*?>", RegexOptions.Compiled);
    
    /// <summary>
    /// Remove HTML from string with compiled Regex.
    /// </summary>
    public static string StripTagsRegexCompiled(string source)
    {
    return _htmlRegex.Replace(source, string.Empty);
    }
    
    0 讨论(0)
  • 2020-11-22 06:52

    Use this method to remove tags:

    public string From_To(string text, string from, string to)
    {
        if (text == null)
            return null;
        string pattern = @"" + from + ".*?" + to;
        Regex rx = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
        MatchCollection matches = rx.Matches(text);
        return matches.Count <= 0 ? text : matches.Cast<Match>().Where(match => !string.IsNullOrEmpty(match.Value)).Aggregate(text, (current, match) => current.Replace(match.Value, ""));
    }
    
    0 讨论(0)
提交回复
热议问题