Using C# regular expressions to remove HTML tags

前端 未结 10 1709
悲&欢浪女
悲&欢浪女 2020-11-22 05:59

How do I use C# regular expression to replace/remove all HTML tags, including the angle brackets? Can someone please help me with the code?

10条回答
  •  误落风尘
    2020-11-22 07:04

    I would like to echo Jason's response though sometimes you need to naively parse some Html and pull out the text content.

    I needed to do this with some Html which had been created by a rich text editor, always fun and games.

    In this case you may need to remove the content of some tags as well as just the tags themselves.

    In my case and tags were thrown into this mix. Some one may find my (very slightly) less naive implementation a useful starting point.

       /// 
        /// Removes all html tags from string and leaves only plain text
        /// Removes content of  and  tags as aim to get text content not markup /meta data.
        /// 
        /// 
        /// 
        public static string HtmlStrip(this string input)
        {
            input = Regex.Replace(input, "",string.Empty);
            input = Regex.Replace(input, @"(.|\n)*?", string.Empty); // remove all  tags and anything inbetween.  
            return Regex.Replace(input, @"<(.|\n)*?>", string.Empty); // remove any tags but not there content "

    bob johnson

    " becomes "bob johnson" }

提交回复
热议问题