Python: Update XML-file using ElementTree while conserving layout as much as possible

后端 未结 4 970
天命终不由人
天命终不由人 2021-01-04 03:36

I have a document which uses an XML namespace for which I want to increase /group/house/dogs by one: (the file is called houses.xml)



        
相关标签:
4条回答
  • 2021-01-04 04:16

    Round-tripping, unfortunately, isn't a trivial problem. With XML, it's generally not possible to preserve the original document unless you use a special parser (like DecentXML but that's for Java).

    Depending on your needs, you have the following options:

    • If you control the source and you can secure your code with unit tests, you can write your own, simple parser. This parser doesn't accept XML but only a limited subset. You can, for example, read the whole document as a string and then use Python's string operations to locate <dogs> and replace anything up to the next <. Hack? Yes.

    • You can filter the output. XML allows the string <ns0: only in one place, so you can search&replace it with < and then the same with <group xmlns:ns0="<group xmlns=". This is pretty safe unless you can have CDATA in your XML.

    • You can write your own, simple XML parser. Read the input as a string and then create Elements for each pair of <> plus their positions in the input. That allows you to take the input apart quickly but only works for small inputs.

    0 讨论(0)
  • 2021-01-04 04:20

    etree from lxml provides this feature.

    1. elementTree.write('houses2.xml',encoding = "UTF-8",xml_declaration = True) helps you in not omitting the declaration

    2. While writing into the file it does not change the namespaces.

    http://lxml.de/parsing.html is the link for its tutorial.

    P.S : lxml should be installed separately.

    0 讨论(0)
  • 2021-01-04 04:27

    when Save xml add default_namespace argument is easy to avoid ns0, on my code

    key code: xmltree.write(xmlfiile,"utf-8",default_namespace=xmlnamespace)

    if os.path.isfile(xmlfiile):
                xmltree = ET.parse(xmlfiile)
                root = xmltree.getroot()
                xmlnamespace = root.tag.split('{')[1].split('}')[0]  //get namespace
    
                initwin=xmltree.find("./{"+ xmlnamespace +"}test")
                initwin.find("./{"+ xmlnamespace +"}content").text = "aaa"
                xmltree.write(xmlfiile,"utf-8",default_namespace=xmlnamespace)
    
    0 讨论(0)
  • 2021-01-04 04:32

    An XML based solution to this problem is to write a helper class for ElementTree which:

    • Grabs the XML-declaration line before parsing as ElementTree at the time of writing is unable to write an XML-declaration line without also writing an encoding attribute(I checked the source).
    • Parses the input file once, grabs the namespace of the root element. Registers that namespace with ElementTree as having the empty string as prefix. When that is done the source file is parsed using ElementTree again, with that new setting.

    It has one major drawback:

    • XML-comments are lost. Which I have learned is not acceptable for this situation(I initially didn´t think the input data had any comments, but it turns out it has).

    My helper class with example:

    from xml.etree import ElementTree as ET
    import re
    
    
    class ElementTreeHelper():
        def __init__(self, xml_file_name):
            xml_file = open(xml_file_name, "rb")
    
            self.__parse_xml_declaration(xml_file)
    
            self.element_tree = ET.parse(xml_file)
            xml_file.seek(0)
    
            root_tag_namespace = self.__root_tag_namespace(self.element_tree)
            self.namespace = None
            if root_tag_namespace is not None:
                self.namespace = '{' + root_tag_namespace + '}'
                # Register the root tag namespace as having an empty prefix, as
                # this has to be done before parsing xml_file we re-parse.
                ET.register_namespace('', root_tag_namespace)
                self.element_tree = ET.parse(xml_file)
    
        def find(self, xpath_query):
            return self.element_tree.find(xpath_query)
    
        def write(self, xml_file_name):
            xml_file = open(xml_file_name, "wb")
            if self.xml_declaration_line is not None:
                xml_file.write(self.xml_declaration_line + '\n')
    
            return self.element_tree.write(xml_file)
    
        def __parse_xml_declaration(self, xml_file):
            first_line = xml_file.readline().strip()
            if first_line.startswith('<?xml') and first_line.endswith('?>'):
                self.xml_declaration_line = first_line
            else:
                self.xml_declaration_line = None
            xml_file.seek(0)
    
        def __root_tag_namespace(self, element_tree):
            namespace_search = re.search('^{(\S+)}', element_tree.getroot().tag)
            if namespace_search is not None:
                return namespace_search.group(1)
            else:
                return None
    
    
    def __main():
        el_tree_hlp = ElementTreeHelper('houses.xml')
    
        dogs_tag = el_tree_hlp.element_tree.getroot().find(
                       '{ns}house/{ns}dogs'.format(
                             ns=el_tree_hlp.namespace))
        one_dog_added = int(dogs_tag.text.strip()) + 1
        dogs_tag.text = str(one_dog_added)
    
        el_tree_hlp.write('hejsan.xml')
    
    if __name__ == '__main__':
        __main()
    

    The output:

    <?xml version="1.0"?>
    <group xmlns="http://dogs.house.local">
        <house>
                <id>2821</id>
                <dogs>3</dogs>
        </house>
    </group>
    

    If someone has an improvement to this solution please don´t hesitate to grab the code and improve it.

    0 讨论(0)
提交回复
热议问题