Pretty printing XML in Python

后端 未结 26 1731
一个人的身影
一个人的身影 2020-11-22 02:18

What is the best way (or are the various ways) to pretty print XML in Python?

相关标签:
26条回答
  • 2020-11-22 03:04

    Take a look at the vkbeautify module.

    It is a python version of my very popular javascript/nodejs plugin with the same name. It can pretty-print/minify XML, JSON and CSS text. Input and output can be string/file in any combinations. It is very compact and doesn't have any dependency.

    Examples:

    import vkbeautify as vkb
    
    vkb.xml(text)                       
    vkb.xml(text, 'path/to/dest/file')  
    vkb.xml('path/to/src/file')        
    vkb.xml('path/to/src/file', 'path/to/dest/file') 
    
    0 讨论(0)
  • 2020-11-22 03:05

    I wrote a solution to walk through an existing ElementTree and use text/tail to indent it as one typically expects.

    def prettify(element, indent='  '):
        queue = [(0, element)]  # (level, element)
        while queue:
            level, element = queue.pop(0)
            children = [(level + 1, child) for child in list(element)]
            if children:
                element.text = '\n' + indent * (level+1)  # for child open
            if queue:
                element.tail = '\n' + indent * queue[0][0]  # for sibling open
            else:
                element.tail = '\n' + indent * (level-1)  # for parent close
            queue[0:0] = children  # prepend so children come before siblings
    
    0 讨论(0)
  • 2020-11-22 03:06

    You can try this variation...

    Install BeautifulSoup and the backend lxml (parser) libraries:

    user$ pip3 install lxml bs4
    

    Process your XML document:

    from bs4 import BeautifulSoup
    
    with open('/path/to/file.xml', 'r') as doc: 
        for line in doc: 
            print(BeautifulSoup(line, 'lxml-xml').prettify())  
    
    0 讨论(0)
  • 2020-11-22 03:08

    lxml is recent, updated, and includes a pretty print function

    import lxml.etree as etree
    
    x = etree.parse("filename")
    print etree.tostring(x, pretty_print=True)
    

    Check out the lxml tutorial: http://lxml.de/tutorial.html

    0 讨论(0)
  • 2020-11-22 03:09

    Here's my (hacky?) solution to get around the ugly text node problem.

    uglyXml = doc.toprettyxml(indent='  ')
    
    text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)    
    prettyXml = text_re.sub('>\g<1></', uglyXml)
    
    print prettyXml
    

    The above code will produce:

    <?xml version="1.0" ?>
    <issues>
      <issue>
        <id>1</id>
        <title>Add Visual Studio 2005 and 2008 solution files</title>
        <details>We need Visual Studio 2005/2008 project files for Windows.</details>
      </issue>
    </issues>
    

    Instead of this:

    <?xml version="1.0" ?>
    <issues>
      <issue>
        <id>
          1
        </id>
        <title>
          Add Visual Studio 2005 and 2008 solution files
        </title>
        <details>
          We need Visual Studio 2005/2008 project files for Windows.
        </details>
      </issue>
    </issues>
    

    Disclaimer: There are probably some limitations.

    0 讨论(0)
  • 2020-11-22 03:10

    I found this question while looking for "how to pretty print html"

    Using some of the ideas in this thread I adapted the XML solutions to work for XML or HTML:

    from xml.dom.minidom import parseString as string_to_dom
    
    def prettify(string, html=True):
        dom = string_to_dom(string)
        ugly = dom.toprettyxml(indent="  ")
        split = list(filter(lambda x: len(x.strip()), ugly.split('\n')))
        if html:
            split = split[1:]
        pretty = '\n'.join(split)
        return pretty
    
    def pretty_print(html):
        print(prettify(html))
    

    When used this is what it looks like:

    html = """\
    <div class="foo" id="bar"><p>'IDK!'</p><br/><div class='baz'><div>
    <span>Hi</span></div></div><p id='blarg'>Try for 2</p>
    <div class='baz'>Oh No!</div></div>
    """
    
    pretty_print(html)
    

    Which returns:

    <div class="foo" id="bar">
      <p>'IDK!'</p>
      <br/>
      <div class="baz">
        <div>
          <span>Hi</span>
        </div>
      </div>
      <p id="blarg">Try for 2</p>
      <div class="baz">Oh No!</div>
    </div>
    
    0 讨论(0)
提交回复
热议问题