I have downloaded a page using urlopen. How do I remove all html tags from it? Is there any regexp to replace all <*> tags?
There are multiple options to filter out Html tags from data. you can use Regex or remove_tags from w3lib which is in-built in python.
from w3lib.html import remove_tags
data_to_remove = 'hello\t\t, \tworld\n
'
print remove_tags(data_to_remove)`
OUTPUT: hello-world
Note: remove_tags accept string object. you can pass remove_tags(str(data_to_remove))