In lxml, how do I remove a tag but retain all contents?

前端 未结 2 923
故里飘歌
故里飘歌 2020-12-05 04:54

The problem is this: I have an XML fragment like so:

text1 inner1 text2 inner2 text3         


        
相关标签:
2条回答
  • 2020-12-05 05:48

    Use Cleaner function of lxml to remove tags from html content. Below is an example to do what you want. For an HTML document, Cleaner is a better general solution to the problem than using strip_elements, because in cases like this you want to strip out more than just the tag; you also want to get rid of things like onclick=function() attributes on other tags.

    import lxml
    from lxml.html.clean import Cleaner
    cleaner = Cleaner()
    cleaner.remove_tags = ['p']
    remove_tags:
    

    A list of tags to remove. Only the tags will be removed, their content will get pulled up into the parent tag.

    0 讨论(0)
  • 2020-12-05 05:52

    Try this: http://lxml.de/api/lxml.etree-module.html#strip_tags

    >>> etree.strip_tags(fragment,'a','c')
    >>> etree.tostring(fragment)
    '<fragment>text1 inner1 text2 <b>inner2</b> text3</fragment>'
    
    0 讨论(0)
提交回复
热议问题