Parsing div elements with Nokogiri

问题

The following code successfully extracts tid and term data:

(answered generously by Uri Agassi)

for i in (1..10)
  doc = Nokogiri::HTML(open("http://somewebsite.com/#{i}/"))
  tids =  doc.xpath("//div[contains(concat(' ', @class, ' '),' thing ')]").collect {|node|    node['data-thing-id']}
  terms = doc.xpath("//div[contains(concat(' ', @class, ' '),' col_a ')]").collect {|node| node.text.strip }

  tids.zip(terms).each do |tid, term|
    puts tid+" "+term
  end
end

from the following sample html:

<div class="thing text-text" data-thing-id="29966403">
  <div class="thinguser"><i class="ico ico-water ico-blue"></i>
    <div class="status">in 7 days
    </div>
  </div>
  <div class="ignore-ui pull-right"><input type="check box" >
  </div>
  <div class="col_a col text">
    <div class="text">foobar
    </div>
  </div>
  <div class="col_b col text">
    <div class="text">foobar desc
    </div>
  </div>
</div>

If I wanted to pull status (the "in 7 days" string) info in the same fashion, what's the best way to do that? I can't seem to figure it out.

Would someone be kind enough to explain in detail what the tids and terms assignment lines are actually doing? I don't get it and the Nokogiri documentation doesn't seem to cover this.

Big thanks in advance.

~Chris

回答1:

I'm all about using css selectors in nokogiri. Something like this should work.

doc = Nokogiri::HTML(open("http://somewebsite.com/#{i}/"))
seven_days = doc.css('status').content

来源：https://stackoverflow.com/questions/23356121/parsing-div-elements-with-nokogiri

标签

html

ruby-on-rails

ruby

xpath

nokogiri