Pretty printing XML in Python

后端 未结 26 1613
一个人的身影
一个人的身影 2020-11-22 02:18

What is the best way (or are the various ways) to pretty print XML in Python?

相关标签:
26条回答
  • 2020-11-22 02:44

    XML pretty print for python looks pretty good for this task. (Appropriately named, too.)

    An alternative is to use pyXML, which has a PrettyPrint function.

    0 讨论(0)
  • 2020-11-22 02:45

    Use etree.indent and etree.tostring

    import lxml.etree as etree
    
    root = etree.fromstring('<html><head></head><body><h1>Welcome</h1></body></html>')
    etree.indent(root, space="  ")
    xml_string = etree.tostring(root, pretty_print=True).decode()
    print(xml_string)
    

    output

    <html>
      <head/>
      <body>
        <h1>Welcome</h1>
      </body>
    </html>
    

    Removing namespaces and prefixes

    import lxml.etree as etree
    
    
    def dump_xml(element):
        for item in element.getiterator():
            item.tag = etree.QName(item).localname
    
        etree.cleanup_namespaces(element)
        etree.indent(element, space="  ")
        result = etree.tostring(element, pretty_print=True).decode()
        return result
    
    
    root = etree.fromstring('<cs:document xmlns:cs="http://blabla.com"><name>hello world</name></cs:document>')
    xml_string = dump_xml(root)
    print(xml_string)
    

    output

    <document>
      <name>hello world</name>
    </document>
    
    0 讨论(0)
  • 2020-11-22 02:46

    If you're using a DOM implementation, each has their own form of pretty-printing built-in:

    # minidom
    #
    document.toprettyxml()
    
    # 4DOM
    #
    xml.dom.ext.PrettyPrint(document, stream)
    
    # pxdom (or other DOM Level 3 LS-compliant imp)
    #
    serializer.domConfig.setParameter('format-pretty-print', True)
    serializer.writeToString(document)
    

    If you're using something else without its own pretty-printer — or those pretty-printers don't quite do it the way you want —  you'd probably have to write or subclass your own serialiser.

    0 讨论(0)
  • 2020-11-22 02:47

    Here's a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations.

    import xml.etree.ElementTree as ET
    import xml.dom.minidom
    import os
    
    def pretty_print_xml_given_root(root, output_xml):
        """
        Useful for when you are editing xml data on the fly
        """
        xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
        xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
        with open(output_xml, "w") as file_out:
            file_out.write(xml_string)
    
    def pretty_print_xml_given_file(input_xml, output_xml):
        """
        Useful for when you want to reformat an already existing xml file
        """
        tree = ET.parse(input_xml)
        root = tree.getroot()
        pretty_print_xml_given_root(root, output_xml)
    

    I found how to fix the common newline issue here.

    0 讨论(0)
  • 2020-11-22 02:48

    BeautifulSoup has a easy to use prettify() method.

    It indents one space per indentation level. It works much better than lxml's pretty_print and is short and sweet.

    from bs4 import BeautifulSoup
    
    bs = BeautifulSoup(open(xml_file), 'xml')
    print bs.prettify()
    
    0 讨论(0)
  • 2020-11-22 02:50
    from yattag import indent
    
    pretty_string = indent(ugly_string)
    

    It won't add spaces or newlines inside text nodes, unless you ask for it with:

    indent(mystring, indent_text = True)
    

    You can specify what the indentation unit should be and what the newline should look like.

    pretty_xml_string = indent(
        ugly_xml_string,
        indentation = '    ',
        newline = '\r\n'
    )
    

    The doc is on http://www.yattag.org homepage.

    0 讨论(0)
提交回复
热议问题