问题
I'm currently trying to parse a node in groovy which contains mixed text and nodes with text and I need to get the text in the right order for example:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<p>
The text has
<strong>nodes</strong>
which need to get parsed
</p>
</root>
Now I want it to parse so I get the whole text but can still edit the node. In this example I want the result:
The text has <b>nodes</b> which need to get parsed
If I could just get a list of all elements under the p
where I can test if its a node or text I would be happy, but I cant find any way to get that.
回答1:
ok, I found a solution I can use without any (tricky) workarounds.
The thing ist, a NodeChild
doesn't have a Method that gives you both child text and child nodes but a Node
does. To get one simply use childNodes()
(because the slurper gives you a NodeChild
)
def root = new XmlSlurper().parse(xml)
root.childNodes().each { target ->
for (s in target.children()) {
if (s instanceof groovy.util.slurpersupport.Node) {
println "Node: "+ s.text()
} else {
println "Text: "+ s
}
}
}
This gives me the result:
Text: The text has
Node: nodes
Text: which need to get parsed
Which means I can easily do whatever I want with my Nodes while they are still in the right order with the text
回答2:
Here You have working example:
def txt = '''
<root>
<p>
<![CDATA[The text has <strong>nodes</strong> which need to get parsed]]>
</p>
</root>
'''
def parsed = new XmlSlurper(false,false).parseText(txt)
assert parsed.p[0].text().trim() == 'The text has <strong>nodes</strong> which need to get parsed'
I guess it's impossible to do without CDATA
tag.
回答3:
You can use XmlUtil and XmlParser like so:
import groovy.xml.*
def xml = '''<?xml version="1.0" encoding="UTF-8"?>
<root>
<p>
The text has
<strong>nodes</strong>
which need to get parsed
</p>
</root>'''
println XmlUtil.serialize(new XmlParser().parseText(xml).p[0])
来源:https://stackoverflow.com/questions/25135812/groovy-xmlslurper-parse-mixed-text-and-nodes