问题
The following code successfully extracts tid and term data:
(answered generously by Uri Agassi)
for i in (1..10)
doc = Nokogiri::HTML(open("http://somewebsite.com/#{i}/"))
tids = doc.xpath("//div[contains(concat(' ', @class, ' '),' thing ')]").collect {|node| node['data-thing-id']}
terms = doc.xpath("//div[contains(concat(' ', @class, ' '),' col_a ')]").collect {|node| node.text.strip }
tids.zip(terms).each do |tid, term|
puts tid+" "+term
end
end
from the following sample html:
<div class="thing text-text" data-thing-id="29966403">
<div class="thinguser"><i class="ico ico-water ico-blue"></i>
<div class="status">in 7 days
</div>
</div>
<div class="ignore-ui pull-right"><input type="check box" >
</div>
<div class="col_a col text">
<div class="text">foobar
</div>
</div>
<div class="col_b col text">
<div class="text">foobar desc
</div>
</div>
</div>
If I wanted to pull status (the "in 7 days" string) info in the same fashion, what's the best way to do that? I can't seem to figure it out.
Would someone be kind enough to explain in detail what the tids and terms assignment lines are actually doing? I don't get it and the Nokogiri documentation doesn't seem to cover this.
Big thanks in advance.
~Chris
回答1:
I'm all about using css selectors in nokogiri. Something like this should work.
doc = Nokogiri::HTML(open("http://somewebsite.com/#{i}/"))
seven_days = doc.css('status').content
来源:https://stackoverflow.com/questions/23356121/parsing-div-elements-with-nokogiri