How to remove all html tags from downloaded page

后端 未结 7 1953
鱼传尺愫
鱼传尺愫 2020-12-31 17:32

I have downloaded a page using urlopen. How do I remove all html tags from it? Is there any regexp to replace all <*> tags?

相关标签:
7条回答
  • 2020-12-31 18:33

    A very simple regexp would be :

    import re
    notag = re.sub("<.*?>", " ", html)
    

    The drawback of this solution is that it doesn't remove javascript or css, but only tags.

    0 讨论(0)
提交回复
热议问题