how to find and edit tags in XML files with namespaces using ElementTree

问题

I would like to find specific tags in my XML document and edit their text or attributes. My XML file contains namespaces (and as I understand it correctly, nested namespaces). The tool I'd like to use for this purpose is ElementTree. I managed to read XML file by iterparse, however I don't know how I can save edited XML, because iterparse doesn't have write element. I need a solution to read XML file by parse and strip its namespaces and nested namespaces or a way to save iterparsed file.

For this case, let's edit the "Rating" tag text.

it = ET.iterparse(adiPath)
    for _, el in it:
        if '}' in el.tag:
            el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
        for at in list(el.attrib): # strip namespaces of attributes too
            if '}' in at:
                newat = at.split('}', 1)[1]
                el.attrib[newat] = el.attrib[at]
                del el.attrib[at]
    root = it.root

    # Search Rating tag and edit it's value
    for rating in root.iter('Rating'):
        print(rating.text) # Prints 18
        rating.text = "999"
        print(rating.text) # Prints 999

However in this case XML file remains unchanged.

Here is XML file:

<?xml version="1.0" encoding="utf-8"?>
<ADI3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:content="urn:cablelabs:md:xsd:content:3.0" xmlns:core="urn:cablelabs:md:xsd:core:3.0" xmlns:offer="urn:cablelabs:md:xsd:offer:3.0" xmlns:terms="urn:cablelabs:md:xsd:terms:3.0" xmlns:title="urn:cablelabs:md:xsd:title:3.0" xmlns:adb="urn:adb:md:xsd:adb:01" xmlns:schemaLocation="urn:adb:md:xsd:adb:01 ADB-EXT-C01.xsd urn:cablelabs:md:xsd:core:3.0 MD-SP-CORE-C01.xsd urn:cablelabs:md:xsd:content:3.0 MD-SP-CONTENT-C01.xsd urn:cablelabs:md:xsd:offer:3.0 MD-SP-OFFER-C01.xsd urn:cablelabs:md:xsd:terms:3.0 MD-SP-TERMS-C01.xsd urn:cablelabs:md:xsd:title:3.0 MD-SP-TITLE-C01.xsd" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns="urn:cablelabs:md:xsd:core:3.0">
  <Asset xsi:type="title:TitleType" uriId="ab://cc.com" providerVersionNum="1" internalVersionNum="0" creationDateTime="2020-01-28T08:55:19Z" startDateTime="2019-05-20T00:00:00Z" endDateTime="2028-08-20T23:59:00Z">
    <AlternateId identifierSystem="VOD1.1">ab://cc.com</AlternateId>
    <Ext>
        <adb:ExtensionType>
            <adb:TitleExt>
                <adb:SeriesInfo episodeNumber="6">
                    <adb:series seriesId="GOT" seasonCount="8"></adb:series>
                    <adb:season seasonId="GOTS08" number="8" episodeCount="6"></adb:season>
                </adb:SeriesInfo>
            </adb:TitleExt>
        </adb:ExtensionType>
    </Ext>
    <title:LocalizableTitle xml:lang="pol">
      <title:TitleLong>Game of Thrones VIII</title:TitleLong>
      <title:SummaryLong>Long summary, long summary, long summary...</title:SummaryLong>
      <title:Actor fullName="Peter Dinklage" firstName="Peter" lastName="Dinklage" />
      <title:Actor fullName="Nikolaj Coster-Waldau" firstName="Nikolaj" lastName="Coster-Waldau" />
      <title:Actor fullName="Emilia Clarke" firstName="Emilia" lastName="Clarke" />
      <title:Actor fullName="Lena Headey" firstName="Lena" lastName="Headey" />
      <title:Director fullName="David Nutter" firstName="David" lastname="Nutter" />
    </title:LocalizableTitle>
    <title:Rating ratingSystem="PL">18</title:Rating>
    <title:Audience>General</title:Audience>
    <title:DisplayRunTime>01:15</title:DisplayRunTime>
    <title:Year>2019</title:Year>
    <title:CountryOfOrigin>US</title:CountryOfOrigin>
    <title:Genre>Film fantasy</title:Genre>
    <title:ShowType>Movie</title:ShowType>
  </Asset>
  <Asset xsi:type="offer:CategoryType" uriId="cc.com/XX">
    <AlternateId identifierSystem="VOD1.1">cc.com/XX</AlternateId>
    <offer:CategoryPath>VOD/GOT/Season 8</offer:CategoryPath>
  </Asset>
  <Asset xsi:type="content:MovieType" uriId="GraoTronVIII_0_1080mp4">
    <AlternateId identifierSystem="VOD1.1">GraoTronVIII_0_1080mp4</AlternateId>
    <content:SourceUrl>GOTS08E06.mp4</content:SourceUrl>
    <content:Resolution>1080p</content:Resolution>
    <content:Duration>PT1H15M20S</content:Duration>
    <content:Language>pol</content:Language>
    <content:Language>eng</content:Language>
  </Asset>
  <Asset xsi:type="content:PreviewType" uriId="GraoTronVIII_1_1080mp4">
    <AlternateId identifierSystem="VOD1.1">GraoTronVIII_1_1080mp4</AlternateId>
    <content:SourceUrl>GOTS08E06_trailer.mp4</content:SourceUrl>
    <content:Resolution>1080p</content:Resolution>
    <content:Duration>PT0H01M48S</content:Duration>
    <content:Language>pol</content:Language>
    <content:Language>eng</content:Language>
  </Asset>
  <Asset xsi:type="content:PosterType" uriId="GraoTronVIIIPoster">
    <AlternateId identifierSystem="VOD1.1">GraoTronVIIIPoster</AlternateId>
    <content:SourceUrl>GOTS08E06.jpg</content:SourceUrl>
    <content:X_Resolution>600</content:X_Resolution>
    <content:Y_Resolution>900</content:Y_Resolution>
    <content:Language>pol</content:Language>
  </Asset>
  <Asset xsi:type="offer:ContentGroupType" uriId="abc">
    <AlternateId identifierSystem="VOD1.1">abc</AlternateId>
    <offer:TitleRef uriId="abc" />
    <offer:MovieRef uriId="GraoTronVIII_0_1080mp4" />
  </Asset>
  <Asset xsi:type="offer:ContentGroupType" uriId="abc">
    <AlternateId identifierSystem="VOD1.1">abc</AlternateId>
    <offer:TitleRef uriId="abc" />
    <offer:MovieRef uriId="GraoTronVIII_1_1080mp4" />
  </Asset>
  <Asset xsi:type="offer:ContentGroupType" uriId="abc">
    <AlternateId identifierSystem="VOD1.1">abc</AlternateId>
    <offer:TitleRef uriId="abc" />
    <offer:MovieRef uriId="GraoTronVIIIPoster" />
  </Asset>
</ADI3>

回答1:

Instead of stripping out the namespaces, I suggest using namespace wildcards. Support for this was added in Python 3.8.

from xml.etree import ElementTree as ET

tree = ET.parse(adiPath)

rating = tree.find(".//{*}Rating")  # Find the Rating element in any namespace
rating.text = "999"

Note that you have to use find() (or findall()). Wildcards do not work with iter().

The following workaround can be used to preserve the original namespace prefixes when serializing the XML document (see also https://stackoverflow.com/a/42372404/407651 and https://stackoverflow.com/a/54491129/407651).

namespaces = dict([elem for _, elem in ET.iterparse("test1.xml", events=['start-ns'])])
for ns in namespaces:
    ET.register_namespace(ns, namespaces[ns])

来源：https://stackoverflow.com/questions/61141671/how-to-find-and-edit-tags-in-xml-files-with-namespaces-using-elementtree

标签

python

xml

parsing

elementtree

iterparse