Remove all javascript tags and style tags from html with python and the lxml module

后端 未结 4 2064
南笙
南笙 2020-12-23 12:11

I am parsing an html document using the http://lxml.de/ library. So far I have figured out how to strip tags from an html document In lxml, how do I remove a tag but retain

4条回答
  •  囚心锁ツ
    2020-12-23 12:39

    You can use the strip_elements method to remove scripts, then use strip_tags method to remove other tags:

    etree.strip_elements(fragment, 'script')
    etree.strip_tags(fragment, 'a', 'p') # and other tags that you want to remove
    

提交回复
热议问题