问题
In Python, how do I remove a node but keep its children using xml.etree
API?
Yes I know there's an answer using lxml but since xml.etree
is part of Python website, I figure it deserves an answer too.
Original xml file:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
Let's say I want to remove country
nodes but keep the children and assign them to the parent of country
?
Ideally, I want a solution that does things "in place" instead of creating a new tree.
My (non-working) solution:
# Get all parents of `country`
for country_parent in root.findall(".//country/.."):
print(country_parent.tag)
# Some countries could have same parent so get all
# `country` nodes of current parent
for country in country_parent.findall("./country"):
print('\t', country.tag)
# For each child of `country`, assign it to parent
# and then delete it from `parent`
for country_child in country:
print('\t\t', country_child.tag)
country_parent.append(country_child)
country.remove(country_child)
country_parent.remove(country)
tree.write("test_mod.xml")
Output of my print statements:
data
country
rank
gdppc
neighbor
country
rank
gdppc
country
rank
gdppc
neighbor
Right away we can see there's a problem: country
is missing the tag year
and some neighbor
tags.
The resulting .xml
output:
<data>
<rank>1</rank>
<gdppc>141100</gdppc>
<neighbor direction="W" name="Switzerland" />
<rank>4</rank>
<gdppc>59900</gdppc>
<rank>68</rank>
<gdppc>13600</gdppc>
<neighbor direction="E" name="Colombia" />
</data>
This is obviously wrong.
QUESTION: Why does this happen?
I can imagine it's from the appending/removing breaking something with the list i.e. I've "invalidated" the list similar to iterator.
回答1:
Remove this line from your program:
country.remove(country_child)
The iteration of an xml.etree.ElementTree.Element
is essentially passed through to the list
of sub-elements. Modifying that list during iteration will yield odd results.
来源:https://stackoverflow.com/questions/38021298/python-xml-etree-remove-node-but-keep-children-assign-children-to-grandparent