Parsing div elements with Nokogiri

本小妞迷上赌 提交于 2020-01-06 11:48:10

问题


The following code successfully extracts tid and term data:

(answered generously by Uri Agassi)

for i in (1..10)
  doc = Nokogiri::HTML(open("http://somewebsite.com/#{i}/"))
  tids =  doc.xpath("//div[contains(concat(' ', @class, ' '),' thing ')]").collect {|node|    node['data-thing-id']}
  terms = doc.xpath("//div[contains(concat(' ', @class, ' '),' col_a ')]").collect {|node| node.text.strip }

  tids.zip(terms).each do |tid, term|
    puts tid+" "+term
  end
end

from the following sample html:

<div class="thing text-text" data-thing-id="29966403">
  <div class="thinguser"><i class="ico ico-water ico-blue"></i>
    <div class="status">in 7 days
    </div>
  </div>
  <div class="ignore-ui pull-right"><input type="check box" >
  </div>
  <div class="col_a col text">
    <div class="text">foobar
    </div>
  </div>
  <div class="col_b col text">
    <div class="text">foobar desc
    </div>
  </div>
</div>

If I wanted to pull status (the "in 7 days" string) info in the same fashion, what's the best way to do that? I can't seem to figure it out.

Would someone be kind enough to explain in detail what the tids and terms assignment lines are actually doing? I don't get it and the Nokogiri documentation doesn't seem to cover this.

Big thanks in advance.

~Chris


回答1:


I'm all about using css selectors in nokogiri. Something like this should work.

doc = Nokogiri::HTML(open("http://somewebsite.com/#{i}/"))
seven_days = doc.css('status').content


来源:https://stackoverflow.com/questions/23356121/parsing-div-elements-with-nokogiri

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!