How keep groovy/XMLSlurper from stripping html tags from a node?

问题

I'm reading an HTML file from a POST response and parsing it with XMLSlurper. The textarea node on the page has some HTML code put into it (non-urlencoded - not my choice) and when I read that value, Groovy strips all the tags.

Example:

<html>
    <body>
        <textarea><html><body>This has html code for some reason</body></html></textarea>
    </body>
</html>

When I parse the above and then find(...) the "textarea" node, it returns to me:

This has html code for some reason

and none of the tags. How do I keep the tags?

回答1:

I think you're getting the right data, but printing it out wrong... Can you try using StreamingMarkupBuilder to convert the node back to a piece of xml?

def xml = '''<html>
            |  <body>
            |    <textarea><html><body>This has html code for some reason</body></html></textarea>
            |  </body>
            |</html>'''

def ta = new XmlSlurper().parseText( xml ).body.textarea

String content = new groovy.xml.StreamingMarkupBuilder().bind {
  mkp.yield ta.children()
}

assert content == '<html><body>This has html code for some reason</body></html>'

来源：https://stackoverflow.com/questions/9710164/how-keep-groovy-xmlslurper-from-stripping-html-tags-from-a-node

标签

grails

groovy

html-parsing

xmlslurper

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!