问题
I'm trying to build a Sanitize transformer that accepts potentially malformed HTML input with elements outside of any tags at all, such as in this example:
out of a tag<p>in a tag</p>out again!
I want to have the transformer wrap any non-tagged elements in <p>
tags so that the above transforms into:
<p>out of a tag</p><p>in a tag</p><p>out again!</p>
Unfortunately, I can't figure out how to select the untagged element because it's not a node. I'm sure I'm missing something here. Can someone give me a nudge in the right direction?
回答1:
require 'nokogiri'
html = 'out of a tag<p>in a tag</p>out again!'
Nokogiri::HTML(html).at_css('body').children.
map {|x| '<p>' + x.text + '</p>' }.join('')
#=> "<p>out of a tag</p><p>in a tag</p><p>out again!</p>"
Text is stored in text nodes. Because CSS cannot select text nodes, you will have to use other methods to get them like Nokogiri::XML::Node#children
.
来源:https://stackoverflow.com/questions/3167809/how-can-i-use-rubys-sanitize-nokogiri-to-access-untagged-elements