I have downloaded a page using urlopen. How do I remove all html tags from it? Is there any regexp to replace all <*> tags?
A very simple regexp would be :
import re notag = re.sub("<.*?>", " ", html)
The drawback of this solution is that it doesn't remove javascript or css, but only tags.