Nokogiri text node contents

前端 未结 2 1996
感动是毒
感动是毒 2021-01-05 00:45

Is there any clean way to get the contents of text nodes with Nokogiri? Right now I\'m using

some_node.at_xpath( \"//whatever\" ).first.content
相关标签:
2条回答
  • 2021-01-05 00:53

    You want only the text?

    doc.search('//text()').map(&:text)
    

    Maybe you don't want all the whitespace and noise. If you want only the text nodes containing a word character,

    doc.search('//text()').map(&:text).delete_if{|x| x !~ /\w/}
    

    Edit: It appears you only wanted the text content of a single node:

    some_node.at_xpath( "//whatever" ).text
    
    0 讨论(0)
  • 2021-01-05 00:58

    Just look for text nodes:

    require 'nokogiri'
    
    doc = Nokogiri::HTML(<<EOT)
    <html>
    <body>
    <p>This is a text node </p>
    <p> This is another text node</p>
    </body>
    </html>
    EOT
    
    doc.search('//text()').each do |t|
      t.replace(t.content.strip)
    end
    
    puts doc.to_html
    

    Which outputs:

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html><body>
    <p>This is a text node</p>
    <p>This is another text node</p>
    </body></html>
    

    BTW, your code example doesn't work. at_xpath( "//whatever" ).first is redundant and will fail. at_xpath will find only the first occurrence, returning a Node. first is superfluous at that point, if it would work, but it won't because Node doesn't have a first method.


    I have <data><foo>bar</foo></bar>, how I get at the "bar" text without doing doc.xpath_at( "//data/foo" ).children.first.content?

    Assuming doc contains the parsed DOM:

    doc.to_xml # => "<?xml version=\"1.0\"?>\n<data>\n  <foo>bar</foo>\n</data>\n"
    

    Get the first occurrence:

    doc.at('foo').text       # => "bar"
    doc.at('//foo').text     # => "bar"
    doc.at('/data/foo').text # => "bar"
    

    Get all occurrences and take the first one:

    doc.search('foo').first.text      # => "bar"
    doc.search('//foo').first.text    # => "bar"
    doc.search('data foo').first.text # => "bar"
    
    0 讨论(0)
提交回复
热议问题