Groovy XmlSlurper parse mixed text and nodes

橙三吉。 提交于 2019-12-11 00:42:14

问题


I'm currently trying to parse a node in groovy which contains mixed text and nodes with text and I need to get the text in the right order for example:

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <p>
      The text has
      <strong>nodes</strong>
      which need to get parsed
   </p>
</root>

Now I want it to parse so I get the whole text but can still edit the node. In this example I want the result:

The text has <b>nodes</b> which need to get parsed

If I could just get a list of all elements under the p where I can test if its a node or text I would be happy, but I cant find any way to get that.


回答1:


ok, I found a solution I can use without any (tricky) workarounds. The thing ist, a NodeChild doesn't have a Method that gives you both child text and child nodes but a Node does. To get one simply use childNodes() (because the slurper gives you a NodeChild)

def root = new XmlSlurper().parse(xml)

    root.childNodes().each { target ->

        for (s in target.children()) {

            if (s instanceof groovy.util.slurpersupport.Node) {
                println "Node: "+ s.text()
            } else {
                println "Text: "+ s
            }
        }
    }

This gives me the result:

Text: The text has
Node: nodes
Text: which need to get parsed

Which means I can easily do whatever I want with my Nodes while they are still in the right order with the text




回答2:


Here You have working example:

def txt = '''
<root>
   <p>
      <![CDATA[The text has <strong>nodes</strong> which need to get parsed]]>
   </p>
</root>
'''
def parsed = new XmlSlurper(false,false).parseText(txt)
assert parsed.p[0].text().trim() == 'The text has <strong>nodes</strong> which need to get parsed'

I guess it's impossible to do without CDATA tag.




回答3:


You can use XmlUtil and XmlParser like so:

import groovy.xml.*

def xml = '''<?xml version="1.0" encoding="UTF-8"?>
<root>
   <p>
      The text has
      <strong>nodes</strong>
      which need to get parsed
   </p>
</root>'''

println XmlUtil.serialize(new XmlParser().parseText(xml).p[0])


来源:https://stackoverflow.com/questions/25135812/groovy-xmlslurper-parse-mixed-text-and-nodes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!