python xml.etree - remove node but keep children (assign children to grandparents)

你说的曾经没有我的故事 提交于 2021-01-29 02:36:47

问题


In Python, how do I remove a node but keep its children using xml.etree API?

Yes I know there's an answer using lxml but since xml.etree is part of Python website, I figure it deserves an answer too.

Original xml file:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

Let's say I want to remove country nodes but keep the children and assign them to the parent of country?

Ideally, I want a solution that does things "in place" instead of creating a new tree.

My (non-working) solution:

# Get all parents of `country`
for country_parent in root.findall(".//country/.."):
    print(country_parent.tag)
    # Some countries could have same parent so get all
    # `country` nodes of current parent
    for country in country_parent.findall("./country"):
        print('\t', country.tag)
        # For each child of `country`, assign it to parent
        # and then delete it from `parent`
        for country_child in country:
            print('\t\t', country_child.tag)
            country_parent.append(country_child)
            country.remove(country_child)
        country_parent.remove(country)
tree.write("test_mod.xml")

Output of my print statements:

data
     country
         rank
         gdppc
         neighbor
     country
         rank
         gdppc
     country
         rank
         gdppc
         neighbor

Right away we can see there's a problem: country is missing the tag year and some neighbor tags.

The resulting .xml output:

<data>
    <rank>1</rank>
        <gdppc>141100</gdppc>
        <neighbor direction="W" name="Switzerland" />
    <rank>4</rank>
        <gdppc>59900</gdppc>
        <rank>68</rank>
        <gdppc>13600</gdppc>
        <neighbor direction="E" name="Colombia" />
    </data>

This is obviously wrong.

QUESTION: Why does this happen?

I can imagine it's from the appending/removing breaking something with the list i.e. I've "invalidated" the list similar to iterator.


回答1:


Remove this line from your program:

        country.remove(country_child)

The iteration of an xml.etree.ElementTree.Element is essentially passed through to the list of sub-elements. Modifying that list during iteration will yield odd results.



来源:https://stackoverflow.com/questions/38021298/python-xml-etree-remove-node-but-keep-children-assign-children-to-grandparent

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!