inserting newlines in xml file generated via xml.etree.ElementTree in python

前端 未结 5 1602
北海茫月
北海茫月 2020-12-02 17:12

I have created a xml file using xml.etree.ElementTree in python. I then use

tree.write(filename, "UTF-8") 

to write out the documen

相关标签:
5条回答
  • 2020-12-02 17:15

    Without the use of external libraries, you can easily achieve a newline between each XML tag in the output by setting the tail attribute for each element to '\n'.

    You can also specify the number of tabs after the newline here. However, in the OP's use-case tabs may be easier to achieve with an external library, or see Erick M. Sprengel's answer.

    I ran into the same problem while trying to modify an xml document using xml.etree.ElementTree in python. In my case, I was parsing the xml file, clearing certain elements (using Element.clear()), and then writing the result back to a file.

    For each element that I had cleared, there was no new line after its tag in the output file.

    ElementTree's Element.clear() documentation states:

    This function removes all subelements, clears all attributes, and sets the text and tail attributes to None.

    This made me realize that the text and tail attributes of an element were how the output format was being determined. In my case, I was able to just set these attributes of the cleared element to the same values as before clearing it. This tail value ended up being '\n\t' for first-level children of the root xml element, with the number of tabs indicating the number of tabs displayed in the output.

    0 讨论(0)
  • 2020-12-02 17:23

    There is no pretty printing support in ElementTree, but you can utilize other XML modules.

    For example, xml.dom.minidom.Node.toprettyxml():

    Node.toprettyxml([indent=""[, newl=""[, encoding=""]]])

    Return a pretty-printed version of the document. indent specifies the indentation string and defaults to a tabulator; newl specifies the string emitted at the end of each line and defaults to \n.

    Use indent and newl to fit your requirements.

    An example, using the default formatting characters:

    >>> from xml.dom import minidom
    >>> from xml.etree import ElementTree
    >>> tree1=ElementTree.XML('<tips><tip>1</tip><tip>2</tip></tips>')
    >>> ElementTree.tostring(tree1)
    '<tips><tip>1</tip><tip>2</tip></tips>'
    >>> print minidom.parseString(ElementTree.tostring(tree1)).toprettyxml()
    <?xml version="1.0" ?>
    <tips>
        <tip>
            1
        </tip>
        <tip>
            2
        </tip>
    </tips>
    
    >>> 
    
    0 讨论(0)
  • 2020-12-02 17:24

    I found a new way to avoid new libraries and reparsing the xml. You just need to pass your root element to this function (see below explanation):

    def indent(elem, level=0):
        i = "\n" + level*"  "
        if len(elem):
            if not elem.text or not elem.text.strip():
                elem.text = i + "  "
            if not elem.tail or not elem.tail.strip():
                elem.tail = i
            for elem in elem:
                indent(elem, level+1)
            if not elem.tail or not elem.tail.strip():
                elem.tail = i
        else:
            if level and (not elem.tail or not elem.tail.strip()):
                elem.tail = i
    

    There is an attribute named "tail" on xml.etree.ElementTree.Element instances. This attribute can set an string after a node:

    "<a>text</a>tail"
    

    I found a link from 2004 telling about an Element Library Functions that uses this "tail" to indent an element.

    Example:

    root = ET.fromstring("<fruits><fruit>banana</fruit><fruit>apple</fruit></fruits>""")
    tree = ET.ElementTree(root)
    
    indent(root)
    # writing xml
    tree.write("example.xml", encoding="utf-8", xml_declaration=True)
    

    Result on "example.xml":

    <?xml version='1.0' encoding='utf-8'?>
    <fruits>
        <fruit>banana</fruit>
        <fruit>apple</fruit>
    </fruits>
    
    0 讨论(0)
  • 2020-12-02 17:41

    The easiest solution I think is switching to the lxml library. In most circumstances you can just change your import from import xml.etree.ElementTree as etree to from lxml import etree or similar.

    You can then use the pretty_print option when serializing:

    tree.write(filename, pretty_print=True)
    

    (also available on etree.tostring)

    0 讨论(0)
  • 2020-12-02 17:41

    According to this thread your best bet would be installing pyXml and use that to prettyprint the ElementTree xml content (as ElementTree doesn't seem to have a prettyprinter by default in Python):

    import xml.etree.ElementTree as ET
    
    from xml.dom.ext.reader import Sax2
    from xml.dom.ext import PrettyPrint
    from StringIO import StringIO
    
    def prettyPrintET(etNode):
        reader = Sax2.Reader()
        docNode = reader.fromString(ET.tostring(etNode))
        tmpStream = StringIO()
        PrettyPrint(docNode, stream=tmpStream)
        return tmpStream.getvalue()
    
    0 讨论(0)
提交回复
热议问题