Pretty printing XML in Python

后端未结

关注

 26  1730

一个人的身影

What is the best way (or are the various ways) to pretty print XML in Python?

相关标签:

26条回答

伪装坚强ぢ

2020-11-22 02:55

I had this problem and solved it like this:

def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
    pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
    if pretty_print: pretty_printed_xml = pretty_printed_xml.replace('  ', indent)
    file.write(pretty_printed_xml)

In my code this method is called like this:

try:
    with open(file_path, 'w') as file:
        file.write('<?xml version="1.0" encoding="utf-8" ?>')

        # create some xml content using etree ...

        xml_parser = XMLParser()
        xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')

except IOError:
    print("Error while writing in log file!")

This works only because etree by default uses two spaces to indent, which I don't find very much emphasizing the indentation and therefore not pretty. I couldn't ind any setting for etree or parameter for any function to change the standard etree indent. I like how easy it is to use etree, but this was really annoying me.

0 讨论(0)

礼貌的吻别

2020-11-22 02:56
I had some problems with minidom's pretty print. I'd get a UnicodeError whenever I tried pretty-printing a document with characters outside the given encoding, eg if I had a β in a document and I tried doc.toprettyxml(encoding='latin-1'). Here's my workaround for it:
```
def toprettyxml(doc, encoding):
    """Return a pretty-printed XML document in a given encoding."""
    unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>',
                          u'<?xml version="1.0" encoding="%s"?>' % encoding)
    return unistr.encode(encoding, 'xmlcharrefreplace')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
不知归路

2020-11-22 03:03
For converting an entire xml document to a pretty xml document
(ex: assuming you've extracted [unzipped] a LibreOffice Writer .odt or .ods file, and you want to convert the ugly "content.xml" file to a pretty one for automated git version control and git difftooling of .odt/.ods files, such as I'm implementing here)
```
import xml.dom.minidom

file = open("./content.xml", 'r')
xml_string = file.read()
file.close()

parsed_xml = xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = parsed_xml.toprettyxml()

file = open("./content_new.xml", 'w')
file.write(pretty_xml_as_string)
file.close()
```
References:
- Thanks to Ben Noland's answer on this page which got me most of the way there.
0 讨论(0)
发布评论:

提交评论
- 加载中...

鱼传尺愫

2020-11-22 03:04

Another solution is to borrow this indent function, for use with the ElementTree library that's built in to Python since 2.5. Here's what that would look like:

from xml.etree import ElementTree

def indent(elem, level=0):
    i = "\n" + level*"  "
    j = "\n" + (level-1)*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for subelem in elem:
            indent(subelem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = j
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = j
    return elem        

root = ElementTree.parse('/tmp/xmlfile').getroot()
indent(root)
ElementTree.dump(root)

0 讨论(0)

野的像风

2020-11-22 03:04
You can use popular external library xmltodict, with unparse and pretty=True you will get best result:
```
xmltodict.unparse(
    xmltodict.parse(my_xml), full_document=False, pretty=True)
```
full_document=False against <?xml version="1.0" encoding="UTF-8"?> at the top.
0 讨论(0)
发布评论:

提交评论
- 加载中...
甜味超标

2020-11-22 03:04
As of Python 3.9 (still a release candidate as of 12 Aug 2020), there is a new xml.etree.ElementTree.indent() function for pretty-printing XML trees.

Sample usage:
```
import xml.etree.ElementTree as ET

element = ET.XML("<html><body>text</body></html>")
ET.indent(element)
```
The upside is that it does not require any additional libraries. For more information check https://bugs.python.org/issue14465 and https://github.com/python/cpython/pull/15200
0 讨论(0)
发布评论:

提交评论
- 加载中...