I am looking for a quick way to parse HTML tags out of a ColdFusion string. We are pulling in an RSS feed, that could potentially have anything in it. We are then doing some man
Disclaimer I am a fierce advocate of using a proper parser (instead of regex) to parse HTML. However, this question isn't about parsing HTML, but about destroying it. For all tasks that go beyond that, use a parser.
I think your regex is good. As long as there is nothing more than removing all HTML tags from the input, using a regex like yours is safe.
Anything else would probably be more hassle than it's worth, but you could write a small function that loops through the string char-by-char once and removes everything that's within tag brackets — e.g.:
<
" character, >
"For a high-demand part of your app, this may be faster than the regex. But the regex is clean and probably fast enough.
Maybe this modified regex has some advantages for you:
<[^>]*(?:>|$)
[^>]*
is better than (.|\n)
The use of REReplaceNoCase()
is unnecessary when there are no actual letters in the pattern. Case-insensitive regex matching is slower than doing it case-sensitively.