python - how to write empty tree node as empty string to xml file

后端 未结 3 1231
花落未央
花落未央 2021-01-27 01:25

I want to remove elements of a certain tag value and then write out the .xml file WITHOUT any tags for those deleted elements; is my only option to create a new tre

相关标签:
3条回答
  • 2021-01-27 01:48

    Whenever modifying XML documents is needed, consider also XSLT, the special-purpose language part of the XSL family which includes XPath. XSLT is designed specifically to transform XML files. Pythoners are not quick to recommend it but it avoids the need of loops or nested if/then logic in general purpose code. Python's lxml module can run XSLT 1.0 scripts using the libxslt processor.

    Below transformation runs the identity transform to copy document as is and then runs an empty template match on <neighbor> to remove it:

    XSLT Script (save as an .xsl file to be loaded just like source .xml, both of which are well-formed xml files)

    <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output version="1.0" encoding="UTF-8" indent="yes" />
    <xsl:strip-space elements="*"/>
    
      <!-- IDENTITY TRANSFORM TO COPY XML AS IS -->
      <xsl:template match="@*|node()">
        <xsl:copy>
          <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
      </xsl:template>
    
      <!-- EMPTY TEMPLATE TO REMOVE NEIGHBOR WHEREVER IT EXISTS -->  
      <xsl:template match="neighbor"/>
    
    </xsl:transform>
    

    Python Script

    import lxml.etree as et
    
    # LOAD XML AND XSL DOCUMENTS
    xml  = et.parse("Input.xml")
    xslt = et.parse("Script.xsl")
    
    # TRANSFORM TO NEW TREE
    transform = et.XSLT(xslt)
    newdom = transform(xml)
    
    # CONVERT TO STRING
    tree_out = et.tostring(newdom, encoding='UTF-8', pretty_print=True,  xml_declaration=True)
    
    # OUTPUT TO FILE
    xmlfile = open('Output.xml'),'wb')
    xmlfile.write(tree_out)
    xmlfile.close()
    
    0 讨论(0)
  • 2021-01-27 01:51

    The trick here is to find the parent (the country node), and delete the neighbor from there. In this example, I am using ElementTree because I am somewhat familiar with it:

    import xml.etree.ElementTree as ET
    
    if __name__ == '__main__':
        with open('debug.log') as f:
            doc = ET.parse(f)
    
            for country in doc.findall('.//country'):
                for neighbor in country.findall('neighbor'):
                    country.remove(neighbor)
    
            ET.dump(doc)  # Display
    
    0 讨论(0)
  • 2021-01-27 02:02
    import lxml.etree as et
    
    xml  = et.parse("test.xml")
    
    for node in xml.xpath("//neighbor"):
        node.getparent().remove(node)
    
    
    xml.write("out.xml",encoding="utf-8",xml_declaration=True)
    

    Using elementTree, we need to find the parents of the neighbor nodes then find the neighbor nodes inside that parent and remove them:

    from xml.etree import ElementTree as et
    
    xml  = et.parse("test.xml")
    
    
    for parent in xml.getroot().findall(".//neighbor/.."):
          for child in parent.findall("./neighbor"):
              parent.remove(child)
    
    
    xml.write("out.xml",encoding="utf-8",xml_declaration=True)
    

    Both will give you:

    <?xml version='1.0' encoding='utf-8'?>
    <data>
        <country name="Liechtenstein">
            <rank>1</rank>
            <year>2008</year>
            <gdppc>141100</gdppc>
            </country>
        <country name="Singapore">
            <rank>4</rank>
            <year>2011</year>
            <gdppc>59900</gdppc>
            </country>
        <country name="Panama">
            <rank>68</rank>
            <year>2011</year>
            <gdppc>13600</gdppc>
            </country>
    </data>
    

    Using your attribute logic and modifying the xml a bit like below:

    x = """<?xml version="1.0"?>
    <data>
        <country name="Liechtenstein">
            <rank>1</rank>
            <year>2008</year>
            <gdppc>141100</gdppc>
            <neighbor name="Austria" direction="E"/>
            <neighbor name="Switzerland" direction="W"/>
        </country>
        <country name="Singapore">
            <rank>4</rank>
            <year>2011</year>
            <gdppc>59900</gdppc>
               <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
            <neighbor name="Malaysia" direction="N"/>
        </country>
        <country name="Panama">
            <rank>68</rank>
            <year>2011</year>
            <gdppc>13600</gdppc>
            <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
            <neighbor name="Colombia" direction="E"/>
        </country>
    </data>"""
    

    Using lxml:

    import lxml.etree as et
    
    xml = et.fromstring(x)
    
    for node in xml.xpath("//neighbor[not(@make) and not(@job) and not(@make)]"):
        node.getparent().remove(node)
    print(et.tostring(xml))
    

    Would give you:

     <data>
        <country name="Liechtenstein">
            <rank>1</rank>
            <year>2008</year>
            <gdppc>141100</gdppc>
            </country>
        <country name="Singapore">
            <rank>4</rank>
            <year>2011</year>
            <gdppc>59900</gdppc>
            <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
            </country>
        <country name="Panama">
            <rank>68</rank>
            <year>2011</year>
            <gdppc>13600</gdppc>
            <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
            </country>
    </data>
    

    The same logic in ElementTree:

    from xml.etree import ElementTree as et
    
    xml = et.parse("test.xml").getroot()
    
    atts = {"build", "job", "make"}
    
    for parent in xml.findall(".//neighbor/.."):
        for child in parent.findall(".//neighbor")[:]:
            if not atts.issubset(child.attrib):
                parent.remove(child)
    

    If you are using iter:

    from xml.etree import ElementTree as et
    
    xml = et.parse("test.xml")
    
    for parent in xml.getroot().iter("*"):
        parent[:] = (child for child in parent if child.tag != "neighbor")
    

    You can see we get the exact same output:

    In [30]: !cat /home/padraic/untitled6/test.xml
    <?xml version="1.0"?>
    <data>
        <country name="Liechtenstein">#
          <neighbor name="Austria" direction="E"/>
            <rank>1</rank>
            <neighbor name="Austria" direction="E"/>
            <year>2008</year>
          <neighbor name="Austria" direction="E"/>
            <gdppc>141100</gdppc>
            <neighbor name="Austria" direction="E"/>
            <neighbor name="Switzerland" direction="W"/>
        </country>
        <country name="Singapore">
            <rank>4</rank>
            <year>2011</year>
            <gdppc>59900</gdppc>
            <neighbor name="Malaysia" direction="N"/>
        </country>
        <country name="Panama">
            <rank>68</rank>
            <year>2011</year>
            <gdppc>13600</gdppc>
            <neighbor name="Costa Rica" direction="W"/>
            <neighbor name="Colombia" direction="E"/>
        </country>
    </data>
    In [31]: paste
    def test():
        import lxml.etree as et
        xml = et.parse("/home/padraic/untitled6/test.xml")
        for node in xml.xpath("//neighbor"):
            node.getparent().remove(node)
        a = et.tostring(xml)
        from xml.etree import ElementTree as et
        xml = et.parse("/home/padraic/untitled6/test.xml")
        for parent in xml.getroot().iter("*"):
            parent[:] = (child for child in parent if child.tag != "neighbor")
        b = et.tostring(xml.getroot())
        assert  a == b
    
    ## -- End pasted text --
    
    In [32]: test()
    
    0 讨论(0)
提交回复
热议问题