how can i remove ns from xml in python?

可紊 提交于 2021-02-16 14:38:07

问题


I have a xml like this:

<?xml version="1.0" encoding="UTF-8"?>
<ns0:epp xmlns:ns0="urn:ietf:params:xml:ns:epp-1.0" 
 xmlns:ns1="http://epp.nic.ir/ns/contact-1.0">
   <ns0:command>
      <ns0:check>
         <ns1:check>
            <ns1:id>ex61-irnic</ns1:id>
            <ns1:id>ex999-irnic</ns1:id>
            <ns1:authInfo>
               <ns1:pw>1487441516170712</ns1:pw>
            </ns1:authInfo>
         </ns1:check>
      </ns0:check>
      <ns0:clTRID>TEST-12345</ns0:clTRID>
   </ns0:command>
</ns0:epp>

I want to change it with python 3 to be like this:

<?xml version="1.0" encoding="UTF-8"?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0">
   <command>
      <check>
         <check>
            <id>ex61-irnic</id>
            <id>ex999-irnic</id>
            <authInfo>
               <pw>1487441516170712</pw>
            </authInfo>
         </check>
      </check>
      <clTRID>TEST-12345</clTRID>
   </command>
</epp>

i tried to remove ns with objectify.deannotate from lxml module. but it didn't work. could you please help me to reach my aim ?


回答1:


This is a combination of Remove namespace and prefix from xml in python using lxml, which shows how to modify the namespace of an element, and lxml: add namespace to input file, which shows how to reset the top namespace map.

The code is a little hacky (I'm particularly suspicious of whether or not it's kosher to use the _setroot method), but it seems to work:

from lxml import etree

inputfile = 'data.xml'
target_ns = 'urn:ietf:params:xml:ns:epp-1.0'
nsmap = {None: target_ns}

tree = etree.parse(inputfile)
root = tree.getroot()

# here we set the namespace of all elements to target_ns
for elem in root.getiterator():
    tag = etree.QName(elem.tag)
    elem.tag = '{%s}%s' % (target_ns, tag.localname)

# create a new root element and set the namespace map, then
# copy over all the child elements    
new_root = etree.Element(root.tag, nsmap=nsmap)
new_root[:] = root[:]

# create a new elementtree with new_root so that we can use the
# .write method.
tree = etree.ElementTree()
tree._setroot(new_root)

tree.write('done.xml',
           pretty_print=True, xml_declaration=True, encoding='UTF-8')

Given your sample input, this produces in done.xml:

<?xml version='1.0' encoding='UTF-8'?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0"><command>
      <check>
         <check>
            <id>ex61-irnic</id>
            <id>ex999-irnic</id>
            <authInfo>
               <pw>1487441516170712</pw>
            </authInfo>
         </check>
      </check>
      <clTRID>TEST-12345</clTRID>
   </command>
</epp>



回答2:


Consider XSLT, the special-purpose language designed to transform XML files such as removing namespaces. Python's third-party module, lxml, can run XSLT 1.0 scripts. And because XSLT scripts are XML files, you can parse from file or string like any XML. No loops or conditional if logic needed. Additionally, you can use this XSLT script in other languages (PHP, Java, C#, etc.)

XSLT (save as .xsl file to be referenced in Python)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <!-- IDENTITY TRANSFROM: COPY DOC AS IS -->
  <xsl:template match="@*|node()">
    <xsl:copy>    
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- REMOVE NAMESPACE PREFIXES, ADD DOC NAMESPACE -->
  <xsl:template match="*">
    <xsl:element name="{local-name()}" namespace="urn:ietf:params:xml:ns:epp-1.0">    
      <xsl:apply-templates select="@*|node()"/>
    </xsl:element>
  </xsl:template>

</xsl:stylesheet>

Python

import lxml.etree as et

# LOAD XML AND XSL
doc = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')

# CONFIGURE AND RUN TRANSFORMER
transform = et.XSLT(xsl)    
result = transform(doc)

# OUTPUT RESULT TREE TO FILE
with open('Output.xml', 'wb') as f:
    f.write(result)

Output

<?xml version="1.0"?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0">
  <command>
    <check>
      <check>
        <id>ex61-irnic</id>
        <id>ex999-irnic</id>
        <authInfo>
          <pw>1487441516170712</pw>
        </authInfo>
      </check>
    </check>
    <clTRID>TEST-12345</clTRID>
  </command>
</epp>


来源:https://stackoverflow.com/questions/45817239/how-can-i-remove-ns-from-xml-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!