问题
I would like to find specific tags in my XML document and edit their text or attributes. My XML file contains namespaces (and as I understand it correctly, nested namespaces). The tool I'd like to use for this purpose is ElementTree. I managed to read XML file by iterparse
, however I don't know how I can save edited XML, because iterparse
doesn't have write
element. I need a solution to read XML file by parse
and strip its namespaces and nested namespaces or a way to save iterparsed file.
For this case, let's edit the "Rating" tag text.
it = ET.iterparse(adiPath)
for _, el in it:
if '}' in el.tag:
el.tag = el.tag.split('}', 1)[1] # strip all namespaces
for at in list(el.attrib): # strip namespaces of attributes too
if '}' in at:
newat = at.split('}', 1)[1]
el.attrib[newat] = el.attrib[at]
del el.attrib[at]
root = it.root
# Search Rating tag and edit it's value
for rating in root.iter('Rating'):
print(rating.text) # Prints 18
rating.text = "999"
print(rating.text) # Prints 999
However in this case XML file remains unchanged.
Here is XML file:
<?xml version="1.0" encoding="utf-8"?>
<ADI3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:content="urn:cablelabs:md:xsd:content:3.0" xmlns:core="urn:cablelabs:md:xsd:core:3.0" xmlns:offer="urn:cablelabs:md:xsd:offer:3.0" xmlns:terms="urn:cablelabs:md:xsd:terms:3.0" xmlns:title="urn:cablelabs:md:xsd:title:3.0" xmlns:adb="urn:adb:md:xsd:adb:01" xmlns:schemaLocation="urn:adb:md:xsd:adb:01 ADB-EXT-C01.xsd urn:cablelabs:md:xsd:core:3.0 MD-SP-CORE-C01.xsd urn:cablelabs:md:xsd:content:3.0 MD-SP-CONTENT-C01.xsd urn:cablelabs:md:xsd:offer:3.0 MD-SP-OFFER-C01.xsd urn:cablelabs:md:xsd:terms:3.0 MD-SP-TERMS-C01.xsd urn:cablelabs:md:xsd:title:3.0 MD-SP-TITLE-C01.xsd" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns="urn:cablelabs:md:xsd:core:3.0">
<Asset xsi:type="title:TitleType" uriId="ab://cc.com" providerVersionNum="1" internalVersionNum="0" creationDateTime="2020-01-28T08:55:19Z" startDateTime="2019-05-20T00:00:00Z" endDateTime="2028-08-20T23:59:00Z">
<AlternateId identifierSystem="VOD1.1">ab://cc.com</AlternateId>
<Ext>
<adb:ExtensionType>
<adb:TitleExt>
<adb:SeriesInfo episodeNumber="6">
<adb:series seriesId="GOT" seasonCount="8"></adb:series>
<adb:season seasonId="GOTS08" number="8" episodeCount="6"></adb:season>
</adb:SeriesInfo>
</adb:TitleExt>
</adb:ExtensionType>
</Ext>
<title:LocalizableTitle xml:lang="pol">
<title:TitleLong>Game of Thrones VIII</title:TitleLong>
<title:SummaryLong>Long summary, long summary, long summary...</title:SummaryLong>
<title:Actor fullName="Peter Dinklage" firstName="Peter" lastName="Dinklage" />
<title:Actor fullName="Nikolaj Coster-Waldau" firstName="Nikolaj" lastName="Coster-Waldau" />
<title:Actor fullName="Emilia Clarke" firstName="Emilia" lastName="Clarke" />
<title:Actor fullName="Lena Headey" firstName="Lena" lastName="Headey" />
<title:Director fullName="David Nutter" firstName="David" lastname="Nutter" />
</title:LocalizableTitle>
<title:Rating ratingSystem="PL">18</title:Rating>
<title:Audience>General</title:Audience>
<title:DisplayRunTime>01:15</title:DisplayRunTime>
<title:Year>2019</title:Year>
<title:CountryOfOrigin>US</title:CountryOfOrigin>
<title:Genre>Film fantasy</title:Genre>
<title:ShowType>Movie</title:ShowType>
</Asset>
<Asset xsi:type="offer:CategoryType" uriId="cc.com/XX">
<AlternateId identifierSystem="VOD1.1">cc.com/XX</AlternateId>
<offer:CategoryPath>VOD/GOT/Season 8</offer:CategoryPath>
</Asset>
<Asset xsi:type="content:MovieType" uriId="GraoTronVIII_0_1080mp4">
<AlternateId identifierSystem="VOD1.1">GraoTronVIII_0_1080mp4</AlternateId>
<content:SourceUrl>GOTS08E06.mp4</content:SourceUrl>
<content:Resolution>1080p</content:Resolution>
<content:Duration>PT1H15M20S</content:Duration>
<content:Language>pol</content:Language>
<content:Language>eng</content:Language>
</Asset>
<Asset xsi:type="content:PreviewType" uriId="GraoTronVIII_1_1080mp4">
<AlternateId identifierSystem="VOD1.1">GraoTronVIII_1_1080mp4</AlternateId>
<content:SourceUrl>GOTS08E06_trailer.mp4</content:SourceUrl>
<content:Resolution>1080p</content:Resolution>
<content:Duration>PT0H01M48S</content:Duration>
<content:Language>pol</content:Language>
<content:Language>eng</content:Language>
</Asset>
<Asset xsi:type="content:PosterType" uriId="GraoTronVIIIPoster">
<AlternateId identifierSystem="VOD1.1">GraoTronVIIIPoster</AlternateId>
<content:SourceUrl>GOTS08E06.jpg</content:SourceUrl>
<content:X_Resolution>600</content:X_Resolution>
<content:Y_Resolution>900</content:Y_Resolution>
<content:Language>pol</content:Language>
</Asset>
<Asset xsi:type="offer:ContentGroupType" uriId="abc">
<AlternateId identifierSystem="VOD1.1">abc</AlternateId>
<offer:TitleRef uriId="abc" />
<offer:MovieRef uriId="GraoTronVIII_0_1080mp4" />
</Asset>
<Asset xsi:type="offer:ContentGroupType" uriId="abc">
<AlternateId identifierSystem="VOD1.1">abc</AlternateId>
<offer:TitleRef uriId="abc" />
<offer:MovieRef uriId="GraoTronVIII_1_1080mp4" />
</Asset>
<Asset xsi:type="offer:ContentGroupType" uriId="abc">
<AlternateId identifierSystem="VOD1.1">abc</AlternateId>
<offer:TitleRef uriId="abc" />
<offer:MovieRef uriId="GraoTronVIIIPoster" />
</Asset>
</ADI3>
回答1:
Instead of stripping out the namespaces, I suggest using namespace wildcards. Support for this was added in Python 3.8.
from xml.etree import ElementTree as ET
tree = ET.parse(adiPath)
rating = tree.find(".//{*}Rating") # Find the Rating element in any namespace
rating.text = "999"
Note that you have to use find()
(or findall()
). Wildcards do not work with iter()
.
The following workaround can be used to preserve the original namespace prefixes when serializing the XML document (see also https://stackoverflow.com/a/42372404/407651 and https://stackoverflow.com/a/54491129/407651).
namespaces = dict([elem for _, elem in ET.iterparse("test1.xml", events=['start-ns'])])
for ns in namespaces:
ET.register_namespace(ns, namespaces[ns])
来源:https://stackoverflow.com/questions/61141671/how-to-find-and-edit-tags-in-xml-files-with-namespaces-using-elementtree