Get text from mixed element xml tags with ElementTree

前端 未结 1 622

I\'m using ElementTree to parse an XML document that I have. I am getting the text from the u tags. Some of them have mixed content that I need to filter out or kee

相关标签:
1条回答
  • 2021-01-29 09:07

    The lost text bits, "¿Sí?" and "A mí no me suena.", are available as the tail property of each <vocal> element (the text following the element's end tag).

    Here is a way to get the wanted output (tested with Python 2.7).

    Assume that vocal.xml looks like this:

    <root>
      <u>
        <vocal type="filler">
          <desc>eh</desc>
        </vocal>¿Sí? 
      </u>
    
      <u>Pues... 
         <vocal type="non-ling">
           <desc>laugh</desc>
         </vocal>A mí no me suena. 
      </u>
    </root>
    

    Code:

    from xml.etree import ElementTree as ET
    
    root = ET.parse("vocal.xml") 
    
    for u in root.findall(".//u"):
        v = u.find("vocal")
    
        if v.get("type") == "filler":
            frags = [u.text, v.findtext("desc"), v.tail]
        else:
            frags = [u.text, v.tail]
    
        print " ".join(t.encode("utf-8").strip() for t in frags).strip()
    

    Output:

    eh ¿Sí?
    Pues... A mí no me suena.
    
    0 讨论(0)
提交回复
热议问题