Pretty printing XML in Python

后端 未结 26 1787
一个人的身影
一个人的身影 2020-11-22 02:18

What is the best way (or are the various ways) to pretty print XML in Python?

相关标签:
26条回答
  • 2020-11-22 02:52

    If you have xmllint you can spawn a subprocess and use it. xmllint --format <file> pretty-prints its input XML to standard output.

    Note that this method uses an program external to python, which makes it sort of a hack.

    def pretty_print_xml(xml):
        proc = subprocess.Popen(
            ['xmllint', '--format', '/dev/stdin'],
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
        )
        (output, error_output) = proc.communicate(xml);
        return output
    
    print(pretty_print_xml(data))
    
    0 讨论(0)
  • 2020-11-22 02:52

    An alternative if you don't want to have to reparse, there is the xmlpp.py library with the get_pprint() function. It worked nice and smoothly for my use cases, without having to reparse to an lxml ElementTree object.

    0 讨论(0)
  • 2020-11-22 02:52

    If for some reason you can't get your hands on any of the Python modules that other users mentioned, I suggest the following solution for Python 2.7:

    import subprocess
    
    def makePretty(filepath):
      cmd = "xmllint --format " + filepath
      prettyXML = subprocess.check_output(cmd, shell = True)
      with open(filepath, "w") as outfile:
        outfile.write(prettyXML)
    

    As far as I know, this solution will work on Unix-based systems that have the xmllint package installed.

    0 讨论(0)
  • 2020-11-22 02:52

    I solved this with some lines of code, opening the file, going trough it and adding indentation, then saving it again. I was working with small xml files, and did not want to add dependencies, or more libraries to install for the user. Anyway, here is what I ended up with:

        f = open(file_name,'r')
        xml = f.read()
        f.close()
    
        #Removing old indendations
        raw_xml = ''        
        for line in xml:
            raw_xml += line
    
        xml = raw_xml
    
        new_xml = ''
        indent = '    '
        deepness = 0
    
        for i in range((len(xml))):
    
            new_xml += xml[i]   
            if(i<len(xml)-3):
    
                simpleSplit = xml[i:(i+2)] == '><'
                advancSplit = xml[i:(i+3)] == '></'        
                end = xml[i:(i+2)] == '/>'    
                start = xml[i] == '<'
    
                if(advancSplit):
                    deepness += -1
                    new_xml += '\n' + indent*deepness
                    simpleSplit = False
                    deepness += -1
                if(simpleSplit):
                    new_xml += '\n' + indent*deepness
                if(start):
                    deepness += 1
                if(end):
                    deepness += -1
    
        f = open(file_name,'w')
        f.write(new_xml)
        f.close()
    

    It works for me, perhaps someone will have some use of it :)

    0 讨论(0)
  • 2020-11-22 02:54
    from lxml import etree
    import xml.dom.minidom as mmd
    
    xml_root = etree.parse(xml_fiel_path, etree.XMLParser())
    
    def print_xml(xml_root):
        plain_xml = etree.tostring(xml_root).decode('utf-8')
        urgly_xml = ''.join(plain_xml .split())
        good_xml = mmd.parseString(urgly_xml)
        print(good_xml.toprettyxml(indent='    ',))
    

    It's working well for the xml with Chinese!

    0 讨论(0)
  • 2020-11-22 02:55

    I tried to edit "ade"s answer above, but Stack Overflow wouldn't let me edit after I had initially provided feedback anonymously. This is a less buggy version of the function to pretty-print an ElementTree.

    def indent(elem, level=0, more_sibs=False):
        i = "\n"
        if level:
            i += (level-1) * '  '
        num_kids = len(elem)
        if num_kids:
            if not elem.text or not elem.text.strip():
                elem.text = i + "  "
                if level:
                    elem.text += '  '
            count = 0
            for kid in elem:
                indent(kid, level+1, count < num_kids - 1)
                count += 1
            if not elem.tail or not elem.tail.strip():
                elem.tail = i
                if more_sibs:
                    elem.tail += '  '
        else:
            if level and (not elem.tail or not elem.tail.strip()):
                elem.tail = i
                if more_sibs:
                    elem.tail += '  '
    
    0 讨论(0)
提交回复
热议问题