I want to remove elements of a certain tag value and then write out the .xml
file WITHOUT any tags for those deleted elements; is my only option to create a new tre
Whenever modifying XML documents is needed, consider also XSLT, the special-purpose language part of the XSL family which includes XPath. XSLT is designed specifically to transform XML files. Pythoners are not quick to recommend it but it avoids the need of loops or nested if/then logic in general purpose code. Python's lxml
module can run XSLT 1.0 scripts using the libxslt processor.
Below transformation runs the identity transform to copy document as is and then runs an empty template match on <neighbor>
to remove it:
XSLT Script (save as an .xsl file to be loaded just like source .xml, both of which are well-formed xml files)
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<!-- IDENTITY TRANSFORM TO COPY XML AS IS -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- EMPTY TEMPLATE TO REMOVE NEIGHBOR WHEREVER IT EXISTS -->
<xsl:template match="neighbor"/>
</xsl:transform>
Python Script
import lxml.etree as et
# LOAD XML AND XSL DOCUMENTS
xml = et.parse("Input.xml")
xslt = et.parse("Script.xsl")
# TRANSFORM TO NEW TREE
transform = et.XSLT(xslt)
newdom = transform(xml)
# CONVERT TO STRING
tree_out = et.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
# OUTPUT TO FILE
xmlfile = open('Output.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()
The trick here is to find the parent (the country node), and delete the neighbor from there. In this example, I am using ElementTree because I am somewhat familiar with it:
import xml.etree.ElementTree as ET
if __name__ == '__main__':
with open('debug.log') as f:
doc = ET.parse(f)
for country in doc.findall('.//country'):
for neighbor in country.findall('neighbor'):
country.remove(neighbor)
ET.dump(doc) # Display
import lxml.etree as et
xml = et.parse("test.xml")
for node in xml.xpath("//neighbor"):
node.getparent().remove(node)
xml.write("out.xml",encoding="utf-8",xml_declaration=True)
Using elementTree, we need to find the parents of the neighbor nodes
then find the neighbor nodes inside that parent
and remove them:
from xml.etree import ElementTree as et
xml = et.parse("test.xml")
for parent in xml.getroot().findall(".//neighbor/.."):
for child in parent.findall("./neighbor"):
parent.remove(child)
xml.write("out.xml",encoding="utf-8",xml_declaration=True)
Both will give you:
<?xml version='1.0' encoding='utf-8'?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
</country>
</data>
Using your attribute logic and modifying the xml a bit like below:
x = """<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>"""
Using lxml:
import lxml.etree as et
xml = et.fromstring(x)
for node in xml.xpath("//neighbor[not(@make) and not(@job) and not(@make)]"):
node.getparent().remove(node)
print(et.tostring(xml))
Would give you:
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
</country>
</data>
The same logic in ElementTree:
from xml.etree import ElementTree as et
xml = et.parse("test.xml").getroot()
atts = {"build", "job", "make"}
for parent in xml.findall(".//neighbor/.."):
for child in parent.findall(".//neighbor")[:]:
if not atts.issubset(child.attrib):
parent.remove(child)
If you are using iter:
from xml.etree import ElementTree as et
xml = et.parse("test.xml")
for parent in xml.getroot().iter("*"):
parent[:] = (child for child in parent if child.tag != "neighbor")
You can see we get the exact same output:
In [30]: !cat /home/padraic/untitled6/test.xml
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">#
<neighbor name="Austria" direction="E"/>
<rank>1</rank>
<neighbor name="Austria" direction="E"/>
<year>2008</year>
<neighbor name="Austria" direction="E"/>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
In [31]: paste
def test():
import lxml.etree as et
xml = et.parse("/home/padraic/untitled6/test.xml")
for node in xml.xpath("//neighbor"):
node.getparent().remove(node)
a = et.tostring(xml)
from xml.etree import ElementTree as et
xml = et.parse("/home/padraic/untitled6/test.xml")
for parent in xml.getroot().iter("*"):
parent[:] = (child for child in parent if child.tag != "neighbor")
b = et.tostring(xml.getroot())
assert a == b
## -- End pasted text --
In [32]: test()