问题
So I've got this UTF-8 string in an XML file:
Horrible place. ☠☠☠
And when I feed it to an external application, the funny characters come back escaped as XML entities:
Horrible place. ☠☠☠
In Ruby, how do I convert that string back to UTF-8? There's probably a really easy solution for this, but I'm unable to find anything in the standard libraries; eg. CGI.unescapeHTML
(which work nicely for things like >
) seem to ignore them completely.
ree-1.8.7-2010.02 > CGI.unescapeHTML('>')
=> ">"
ree-1.8.7-2010.02 > CGI.unescapeHTML('☠')
=> "☠"
回答1:
Well, since it's XML encoded I'd go for an XML parser:
require 'nokogiri'
frag = 'Horrible place. ☠☠☠'
doc = Nokogiri::XML.fragment(frag)
puts doc.text
# >> Horrible place. ☠☠☠
回答2:
CGI.unescapeHTML
works just fine; the console you are using is probably unable to display the unicode character.
Try this and it should work fine:
File.open("d:\\11.txt", 'w') {|f| f.write(CGI.unescapeHTML('☠')) } # => ☠
来源:https://stackoverflow.com/questions/4559104/converting-escaped-xml-entities-back-into-utf-8