Pretty printing XML in Python

后端 未结 26 1786
一个人的身影
一个人的身影 2020-11-22 02:18

What is the best way (or are the various ways) to pretty print XML in Python?

相关标签:
26条回答
  • 2020-11-22 02:55

    I had this problem and solved it like this:

    def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
        pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
        if pretty_print: pretty_printed_xml = pretty_printed_xml.replace('  ', indent)
        file.write(pretty_printed_xml)
    

    In my code this method is called like this:

    try:
        with open(file_path, 'w') as file:
            file.write('<?xml version="1.0" encoding="utf-8" ?>')
    
            # create some xml content using etree ...
    
            xml_parser = XMLParser()
            xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')
    
    except IOError:
        print("Error while writing in log file!")
    

    This works only because etree by default uses two spaces to indent, which I don't find very much emphasizing the indentation and therefore not pretty. I couldn't ind any setting for etree or parameter for any function to change the standard etree indent. I like how easy it is to use etree, but this was really annoying me.

    0 讨论(0)
  • 2020-11-22 02:56

    I had some problems with minidom's pretty print. I'd get a UnicodeError whenever I tried pretty-printing a document with characters outside the given encoding, eg if I had a β in a document and I tried doc.toprettyxml(encoding='latin-1'). Here's my workaround for it:

    def toprettyxml(doc, encoding):
        """Return a pretty-printed XML document in a given encoding."""
        unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>',
                              u'<?xml version="1.0" encoding="%s"?>' % encoding)
        return unistr.encode(encoding, 'xmlcharrefreplace')
    
    0 讨论(0)
  • 2020-11-22 03:03

    For converting an entire xml document to a pretty xml document
    (ex: assuming you've extracted [unzipped] a LibreOffice Writer .odt or .ods file, and you want to convert the ugly "content.xml" file to a pretty one for automated git version control and git difftooling of .odt/.ods files, such as I'm implementing here)

    import xml.dom.minidom
    
    file = open("./content.xml", 'r')
    xml_string = file.read()
    file.close()
    
    parsed_xml = xml.dom.minidom.parseString(xml_string)
    pretty_xml_as_string = parsed_xml.toprettyxml()
    
    file = open("./content_new.xml", 'w')
    file.write(pretty_xml_as_string)
    file.close()
    

    References:
    - Thanks to Ben Noland's answer on this page which got me most of the way there.

    0 讨论(0)
  • 2020-11-22 03:04

    Another solution is to borrow this indent function, for use with the ElementTree library that's built in to Python since 2.5. Here's what that would look like:

    from xml.etree import ElementTree
    
    def indent(elem, level=0):
        i = "\n" + level*"  "
        j = "\n" + (level-1)*"  "
        if len(elem):
            if not elem.text or not elem.text.strip():
                elem.text = i + "  "
            if not elem.tail or not elem.tail.strip():
                elem.tail = i
            for subelem in elem:
                indent(subelem, level+1)
            if not elem.tail or not elem.tail.strip():
                elem.tail = j
        else:
            if level and (not elem.tail or not elem.tail.strip()):
                elem.tail = j
        return elem        
    
    root = ElementTree.parse('/tmp/xmlfile').getroot()
    indent(root)
    ElementTree.dump(root)
    
    0 讨论(0)
  • 2020-11-22 03:04

    You can use popular external library xmltodict, with unparse and pretty=True you will get best result:

    xmltodict.unparse(
        xmltodict.parse(my_xml), full_document=False, pretty=True)
    

    full_document=False against <?xml version="1.0" encoding="UTF-8"?> at the top.

    0 讨论(0)
  • 2020-11-22 03:04

    As of Python 3.9 (still a release candidate as of 12 Aug 2020), there is a new xml.etree.ElementTree.indent() function for pretty-printing XML trees.

    Sample usage:

    import xml.etree.ElementTree as ET
    
    element = ET.XML("<html><body>text</body></html>")
    ET.indent(element)
    

    The upside is that it does not require any additional libraries. For more information check https://bugs.python.org/issue14465 and https://github.com/python/cpython/pull/15200

    0 讨论(0)
提交回复
热议问题