发表新帖

发表新帖

How to remove all html tags from downloaded page

后端未结

关注

 7  1970

I have downloaded a page using urlopen. How do I remove all html tags from it? Is there any regexp to replace all <*> tags?

相关标签:

7条回答

时光取名叫无心

2020-12-31 18:33
A very simple regexp would be :
```
import re
notag = re.sub("<.*?>", " ", html)
```
The drawback of this solution is that it doesn't remove javascript or css, but only tags.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2

热议问题