问题
I'm parsing HTML and trying to value of a parent node itself, without values of the children nodes.
HTML example:
<html>
<body>
<div>
<a href="http://intro.com">extra stuff</a>
Text I would like to get.
<a href="http://example.com">link to example</a>
</div>
</body>
</html>
Code:
def tagsoupParser = new org.ccil.cowan.tagsoup.Parser()
def slurper = new XmlSlurper(tagsoupParser)
def htmlParsed = slurper.parseText(stringToParse)
println htmlParsed.body.div[0]
However above code returns:
extra stuff Text I would like to get. link to example
How can I get only parent node value without children? Example:
Text I would like to get.
P.S: I tried removing extra elements by doing substring but it proves to be unreliable.
回答1:
If you switch to using XmlParser
instead of XmlSlurper
, you can do:
println htmlParsed.body.div[0].localText()[0]
Assuming you are on Groovy 2.3+
来源:https://stackoverflow.com/questions/29629006/groovy-xmlslurper-get-value-of-the-node-without-children