How can one replace an element with text in lxml?

前端 未结 3 2089
礼貌的吻别
礼貌的吻别 2020-12-16 16:32

It\'s easy to completely remove a given element from an XML document with lxml\'s implementation of the ElementTree API, but I can\'t see an easy way of consistently replaci

相关标签:
3条回答
  • 2020-12-16 16:39

    Using ET.XSLT:

    import io
    import lxml.etree as ET
    
    data = '''<everything>
    <m>Some text before <r/></m>
    <m><r/> and some text after.</m>
    <m><r/></m>
    <m>Text before <r/> and after</m>
    <m><b/> Text after a sibling <r/> Text before a sibling<b/></m>
    </everything>
    '''
    
    f=ET.fromstring(data)
    xslt='''\
        <xsl:stylesheet version="1.0"
            xmlns:xsl="http://www.w3.org/1999/XSL/Transform">    
    
        <!-- Replace r nodes with DELETED
             http://www.w3schools.com/xsl/el_template.asp -->
        <xsl:template match="r">DELETED</xsl:template>
    
        <!-- How to copy XML without changes
             http://mrhaki.blogspot.com/2008/07/copy-xml-as-is-with-xslt.html -->    
        <xsl:template match="*">
            <xsl:copy>
                <xsl:apply-templates select="@*|node()"/>
            </xsl:copy>
        </xsl:template>
        <xsl:template match="@*|text()|comment()|processing-instruction">
            <xsl:copy-of select="."/>
        </xsl:template>
        </xsl:stylesheet>
    '''
    
    xslt_doc=ET.parse(io.BytesIO(xslt))
    transform=ET.XSLT(xslt_doc)
    f=transform(f)
    
    print(ET.tostring(f))
    

    yields

    <everything>
    <m>Some text before DELETED</m>
    <m>DELETED and some text after.</m>
    <m>DELETED</m>
    <m>Text before DELETED and after</m>
    <m><b/> Text after a sibling DELETED Text before a sibling<b/></m>
    </everything>
    
    0 讨论(0)
  • 2020-12-16 16:43

    I think that unutbu's XSLT solution is probably the correct way to achieve your goal.

    However, here's a somewhat hacky way to achieve it, by modifying the tails of <r/> tags and then using etree.strip_elements.

    from lxml import etree
    
    data = '''<everything>
    <m>Some text before <r/></m>
    <m><r/> and some text after.</m>
    <m><r/></m>
    <m>Text before <r/> and after</m>
    <m><b/> Text after a sibling <r/> Text before a sibling<b/></m>
    </everything>
    '''
    
    f = etree.fromstring(data)
    for r in f.xpath('//r'):
      r.tail = 'DELETED' + r.tail if r.tail else 'DELETED'
    
    etree.strip_elements(f,'r',with_tail=False)
    
    print etree.tostring(f,pretty_print=True)
    

    Gives you:

    <everything>
    <m>Some text before DELETED</m>
    <m>DELETED and some text after.</m>
    <m>DELETED</m>
    <m>Text before DELETED and after</m>
    <m><b/> Text after a sibling DELETED Text before a sibling<b/></m>
    </everything>
    
    0 讨论(0)
  • 2020-12-16 16:48

    Using strip_elements has the disadvantage that you cannot make it keep some of the <r> elements while replacing others. It also requires the existence of an ElementTree instance (which may be not the case). And last, you cannot use it to replace XML comments or processing instructions. The following should do your job:

    for r in f.xpath('//r'):
        text = 'DELETED' + r.tail 
        parent = r.getparent()
        if parent is not None:
            previous = r.getprevious()
            if previous is not None:
                previous.tail = (previous.tail or '') + text
            else:
                parent.text = (parent.text or '') + text
            parent.remove(r)
    
    0 讨论(0)
提交回复
热议问题