CGI.escapeHTML
is pretty bad, but CGI.unescapeHTML
is completely borked. For example:
require \'cgi\'
CGI.unescapeHTML(\'…\
The htmlentities gem should do the trick:
require 'rubygems'
require 'htmlentities'
coder = HTMLEntities.new
coder.decode('…') # => "…"
coder.decode('…') # => "…"
coder.decode('¢') # => "¢"
coder.decode('¢') # => "¢"
coder.encode("…", :named) # => "…"
coder.encode("…", :decimal) # => "…"
require 'rubygems'
require 'hpricot'
Hpricot('…', :xhtml_strict => true).to_plain_text
Though you might have to fiddle around with the character encoding.