What is the best way (or are the various ways) to pretty print XML in Python?
I had this problem and solved it like this:
def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
if pretty_print: pretty_printed_xml = pretty_printed_xml.replace(' ', indent)
file.write(pretty_printed_xml)
In my code this method is called like this:
try:
with open(file_path, 'w') as file:
file.write('<?xml version="1.0" encoding="utf-8" ?>')
# create some xml content using etree ...
xml_parser = XMLParser()
xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')
except IOError:
print("Error while writing in log file!")
This works only because etree by default uses two spaces
to indent, which I don't find very much emphasizing the indentation and therefore not pretty. I couldn't ind any setting for etree or parameter for any function to change the standard etree indent. I like how easy it is to use etree, but this was really annoying me.
I had some problems with minidom's pretty print. I'd get a UnicodeError whenever I tried pretty-printing a document with characters outside the given encoding, eg if I had a β in a document and I tried doc.toprettyxml(encoding='latin-1')
. Here's my workaround for it:
def toprettyxml(doc, encoding):
"""Return a pretty-printed XML document in a given encoding."""
unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>',
u'<?xml version="1.0" encoding="%s"?>' % encoding)
return unistr.encode(encoding, 'xmlcharrefreplace')
For converting an entire xml document to a pretty xml document
(ex: assuming you've extracted [unzipped] a LibreOffice Writer .odt or .ods file, and you want to convert the ugly "content.xml" file to a pretty one for automated git version control and git difftool
ing of .odt/.ods files, such as I'm implementing here)
import xml.dom.minidom
file = open("./content.xml", 'r')
xml_string = file.read()
file.close()
parsed_xml = xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = parsed_xml.toprettyxml()
file = open("./content_new.xml", 'w')
file.write(pretty_xml_as_string)
file.close()
References:
- Thanks to Ben Noland's answer on this page which got me most of the way there.
Another solution is to borrow this indent function, for use with the ElementTree library that's built in to Python since 2.5. Here's what that would look like:
from xml.etree import ElementTree
def indent(elem, level=0):
i = "\n" + level*" "
j = "\n" + (level-1)*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
for subelem in elem:
indent(subelem, level+1)
if not elem.tail or not elem.tail.strip():
elem.tail = j
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = j
return elem
root = ElementTree.parse('/tmp/xmlfile').getroot()
indent(root)
ElementTree.dump(root)
You can use popular external library xmltodict, with unparse
and pretty=True
you will get best result:
xmltodict.unparse(
xmltodict.parse(my_xml), full_document=False, pretty=True)
full_document=False
against <?xml version="1.0" encoding="UTF-8"?>
at the top.
As of Python 3.9 (still a release candidate as of 12 Aug 2020), there is a new xml.etree.ElementTree.indent()
function for pretty-printing XML trees.
Sample usage:
import xml.etree.ElementTree as ET
element = ET.XML("<html><body>text</body></html>")
ET.indent(element)
The upside is that it does not require any additional libraries. For more information check https://bugs.python.org/issue14465 and https://github.com/python/cpython/pull/15200