Converting escaped XML entities back into UTF-8

倾然丶 夕夏残阳落幕 提交于 2019-12-21 05:14:18

问题


So I've got this UTF-8 string in an XML file:

Horrible place. ☠☠☠

And when I feed it to an external application, the funny characters come back escaped as XML entities:

Horrible place. ☠☠☠

In Ruby, how do I convert that string back to UTF-8? There's probably a really easy solution for this, but I'm unable to find anything in the standard libraries; eg. CGI.unescapeHTML (which work nicely for things like >) seem to ignore them completely.

ree-1.8.7-2010.02 > CGI.unescapeHTML('>')
 => ">" 
ree-1.8.7-2010.02 > CGI.unescapeHTML('☠')
 => "☠" 

回答1:


Well, since it's XML encoded I'd go for an XML parser:

require 'nokogiri'

frag = 'Horrible place. ☠☠☠'
doc = Nokogiri::XML.fragment(frag)
puts doc.text
# >> Horrible place. ☠☠☠



回答2:


CGI.unescapeHTML works just fine; the console you are using is probably unable to display the unicode character.

Try this and it should work fine:

File.open("d:\\11.txt", 'w') {|f| f.write(CGI.unescapeHTML('☠')) } # => ☠


来源:https://stackoverflow.com/questions/4559104/converting-escaped-xml-entities-back-into-utf-8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!