Remove all javascript tags and style tags from html with python and the lxml module

后端未结

关注

 4  2083

南笙 2020-12-23 12:11

I am parsing an html document using the http://lxml.de/ library. So far I have figured out how to strip tags from an html document In lxml, how do I remove a tag but retain

4条回答

囚心锁ツ (楼主)

2020-12-23 12:39
You can use the strip_elements method to remove scripts, then use strip_tags method to remove other tags:
```
etree.strip_elements(fragment, 'script')
etree.strip_tags(fragment, 'a', 'p') # and other tags that you want to remove
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...