Extracting text from XML node with minidom

后端 未结 3 468
误落风尘
误落风尘 2021-01-19 05:58

I\'ve looked through several posts but I haven\'t quite found any answers that have solved my problem.

Sample XML =




        
3条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-01-19 06:55

    You should use the ElementTree api instead of minidom for your task (as explained in the other answers here), but if you need to use minidom, here is a solution.

    What you are looking for was added to DOM level 3 as the textContent attribute. Minidom only supports level 1.

    However you can emulate textContent pretty closely with this function:

    def textContent(node):
        if node.nodeType in (node.TEXT_NODE, node.CDATA_SECTION_NODE):
            return node.nodeValue
        else:
            return ''.join(textContent(n) for n in node.childNodes)
    

    Which you can then use like so:

    x = minidom.parseString("""
    TEXT1TEXT2 TEXT3""")
    
    twn = x.getElementsByTagName('TextWithNodes')[0]
    
    assert textContent(twn) == u'\nTEXT1TEXT2 TEXT3'
    

    Notice how I got the text content of the parent node TextWithNodes. This is because your Node elements are siblings of those text nodes, not parents of them.

提交回复
热议问题