Whats the easiest way to strip the HTML tags in perl. I am using a regular expression to parse HTML from a URL which works great but how can I strip the HTML tags off?
If you just want to remove HTML tags:
s///sg
s/<.+?>//sg
This will (most of the time) remove script tags and their contents, and all other HTML tags. You could also probably remove everything before the tag safely with regex.
For anything more complex than that, though, regular expressions are not a suitable tool, and you really need to parse the HTML with an actual HTML parser and then manipulate that to remove the tags.