I would like to add things like bullet points \"•\" to HTML using the XML Builder in Nokogiri, but everything is being escaped. How do I prevent it from being escaped
When you're setting the text of an element, you really are setting text, not HTML source. <
and &
don't have any special meaning in plain text.
So just type a bullet: '•'
. Of course your source code and your XML file will have to be using the same encoding for that to come out right. If your XML file is UTF-8 but your source code isn't, you'd probably have to say '\xe2\x80\xa2'
which is the UTF-8 byte sequence for the bullet character as a string literal.
(In general non-ASCII characters in Ruby 1.8 are tricky. The byte-based interfaces don't mesh too well with XML's world of all-text-is-Unicode.)
If you define
class Nokogiri::XML::Builder
def entity(code)
doc = Nokogiri::XML("<?xml version='1.0'?><root>&##{code};</root>")
insert(doc.root.children.first)
end
end
then this
builder = Nokogiri::XML::Builder.new do |xml|
xml.span {
xml.text "I can has "
xml.entity 8665
xml.text " entity?"
}
end
puts builder.to_xml
yields
<?xml version="1.0"?>
<span>I can has • entity?</span>
PS this a workaround only, for a clean solution please refer to the libxml2
documentation (Nokogiri is built on libxml2) for more help. However, even these folks admit that handling entities can be quite ..err, cumbersome sometimes.