Remove HTML tags from a String

后端 未结 30 3118
误落风尘
误落风尘 2020-11-21 07:35

Is there a good way to remove HTML from a Java string? A simple regex like

replaceAll("\\\\<.*?>", &quo         


        
30条回答
  •  孤街浪徒
    2020-11-21 07:59

    You might want to replace
    and

    tags with newlines before stripping the HTML to prevent it becoming an illegible mess as Tim suggests.

    The only way I can think of removing HTML tags but leaving non-HTML between angle brackets would be check against a list of HTML tags. Something along these lines...

    replaceAll("\\<[\s]*tag[^>]*>","")
    

    Then HTML-decode special characters such as &. The result should not be considered to be sanitized.

提交回复
热议问题