Python pretty XML printer with lxml

前端 未结 5 548
一整个雨季
一整个雨季 2020-11-28 09:37

After reading from an existing file with \'ugly\' XML and doing some modifications, pretty printing doesn\'t work. I\'ve tried etree.write(FILE_NAME, pretty_print=True

相关标签:
5条回答
  • 2020-11-28 09:45
    fp = file('out.txt', 'w')
    print(e.tree.tostring(...), file=fp)
    fp.close()
    
    0 讨论(0)
  • 2020-11-28 10:07

    Here is an answer that is fixed to work with Python 3:

    from lxml import etree
    from sys import stdout
    from io import BytesIO
    
    parser = etree.XMLParser(remove_blank_text = True)
    file_obj = BytesIO(text)
    tree = etree.parse(file_obj, parser)
    tree.write(stdout.buffer, pretty_print = True)
    

    where text is the xml code as a sequence of bytes.

    0 讨论(0)
  • 2020-11-28 10:09

    I am not sure why other answers did not mention this. If you want to obtain the root of the xml there is a method called getroot(). I hope I answered your question (though a little late).

    tree = et.parse(xmlFile)
    root = tree.getroot()
    
    0 讨论(0)
  • 2020-11-28 10:10

    For me, this issue was not solved until I noticed this little tidbit here:

    http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output

    Short version:

    Read in the file with this command:

    >>> parser = etree.XMLParser(remove_blank_text=True)
    >>> tree = etree.parse(filename, parser)
    

    That will "reset" the already existing indentation, allowing the output to generate it's own indentation correctly. Then pretty_print as normal:

    >>> tree.write(<output_file_name>, pretty_print=True)
    
    0 讨论(0)
  • 2020-11-28 10:10

    Well, according to the API docs, there is no method "write" in the lxml etree module. You've got a couple of options in regards to getting a pretty printed xml string into a file. You can use the tostring method like so:

    f = open('doc.xml', 'w')
    f.write(etree.tostring(root, pretty_print=True))
    f.close()
    

    Or, if your input source is less than perfect and/or you want more knobs and buttons to configure your out put you could use one of the python wrappers for the tidy lib.

    http://utidylib.berlios.de/

    import tidy
    f.write(tidy.parseString(your_xml_str, **{'output_xml':1, 'indent':1, 'input_xml':1}))
    

    http://countergram.com/open-source/pytidylib

    from tidylib import tidy_document
    document, errors = tidy_document(your_xml_str, options={'output_xml':1, 'indent':1, 'input_xml':1})
    f.write(document)
    
    0 讨论(0)
提交回复
热议问题