How keep groovy/XMLSlurper from stripping html tags from a node?

北城余情 提交于 2019-12-08 06:42:52

问题


I'm reading an HTML file from a POST response and parsing it with XMLSlurper. The textarea node on the page has some HTML code put into it (non-urlencoded - not my choice) and when I read that value, Groovy strips all the tags.

Example:

<html>
    <body>
        <textarea><html><body>This has html code for some reason</body></html></textarea>
    </body>
</html>

When I parse the above and then find(...) the "textarea" node, it returns to me:

This has html code for some reason

and none of the tags. How do I keep the tags?


回答1:


I think you're getting the right data, but printing it out wrong... Can you try using StreamingMarkupBuilder to convert the node back to a piece of xml?

def xml = '''<html>
            |  <body>
            |    <textarea><html><body>This has html code for some reason</body></html></textarea>
            |  </body>
            |</html>'''

def ta = new XmlSlurper().parseText( xml ).body.textarea

String content = new groovy.xml.StreamingMarkupBuilder().bind {
  mkp.yield ta.children()
}

assert content == '<html><body>This has html code for some reason</body></html>'


来源:https://stackoverflow.com/questions/9710164/how-keep-groovy-xmlslurper-from-stripping-html-tags-from-a-node

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!