Pretty printing XML in Python

后端未结

关注

 26  1784

一个人的身影

What is the best way (or are the various ways) to pretty print XML in Python?

相关标签:

26条回答

星月不相逢

2020-11-22 02:44

XML pretty print for python looks pretty good for this task. (Appropriately named, too.)

An alternative is to use pyXML, which has a PrettyPrint function.

0 讨论(0)
发布评论:

提交评论
- 加载中...

难免孤独

2020-11-22 02:45

Use etree.indent and etree.tostring

import lxml.etree as etree

root = etree.fromstring('<html><head></head><body><h1>Welcome</h1></body></html>')
etree.indent(root, space="  ")
xml_string = etree.tostring(root, pretty_print=True).decode()
print(xml_string)

output

<html>
  <head/>
  <body>
    <h1>Welcome</h1>
  </body>
</html>

Removing namespaces and prefixes

import lxml.etree as etree


def dump_xml(element):
    for item in element.getiterator():
        item.tag = etree.QName(item).localname

    etree.cleanup_namespaces(element)
    etree.indent(element, space="  ")
    result = etree.tostring(element, pretty_print=True).decode()
    return result


root = etree.fromstring('<cs:document xmlns:cs="http://blabla.com"><name>hello world</name></cs:document>')
xml_string = dump_xml(root)
print(xml_string)

output

<document>
  <name>hello world</name>
</document>

0 讨论(0)

小蘑菇

2020-11-22 02:46
If you're using a DOM implementation, each has their own form of pretty-printing built-in:
```
# minidom
#
document.toprettyxml()

# 4DOM
#
xml.dom.ext.PrettyPrint(document, stream)

# pxdom (or other DOM Level 3 LS-compliant imp)
#
serializer.domConfig.setParameter('format-pretty-print', True)
serializer.writeToString(document)
```
If you're using something else without its own pretty-printer — or those pretty-printers don't quite do it the way you want — you'd probably have to write or subclass your own serialiser.
0 讨论(0)
发布评论:

提交评论
- 加载中...

北荒

2020-11-22 02:47

Here's a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations.

import xml.etree.ElementTree as ET
import xml.dom.minidom
import os

def pretty_print_xml_given_root(root, output_xml):
    """
    Useful for when you are editing xml data on the fly
    """
    xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
    xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
    with open(output_xml, "w") as file_out:
        file_out.write(xml_string)

def pretty_print_xml_given_file(input_xml, output_xml):
    """
    Useful for when you want to reformat an already existing xml file
    """
    tree = ET.parse(input_xml)
    root = tree.getroot()
    pretty_print_xml_given_root(root, output_xml)

I found how to fix the common newline issue here.

0 讨论(0)

不知归路

2020-11-22 02:48
BeautifulSoup has a easy to use prettify() method.

It indents one space per indentation level. It works much better than lxml's pretty_print and is short and sweet.
```
from bs4 import BeautifulSoup

bs = BeautifulSoup(open(xml_file), 'xml')
print bs.prettify()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦如初夏

2020-11-22 02:50
```
from yattag import indent

pretty_string = indent(ugly_string)
```
It won't add spaces or newlines inside text nodes, unless you ask for it with:
```
indent(mystring, indent_text = True)
```
You can specify what the indentation unit should be and what the newline should look like.
```
pretty_xml_string = indent(
    ugly_xml_string,
    indentation = '    ',
    newline = '\r\n'
)
```
The doc is on http://www.yattag.org homepage.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 4 5 下一页