I\'m trying to use the nokogiri gem to extract all the urls on the page as well their link text and store the link text and url in a hash.
&
Here's a one-liner:
Hash[doc.xpath('//a[@href]').map {|link| [link.text.strip, link["href"]]}]
#=> {"Foo"=>"#foo", "Bar"=>"#bar"}
Split up a bit to be arguably more readable:
h = {}
doc.xpath('//a[@href]').each do |link|
h[link.text.strip] = link['href']
end
puts h
#=> {"Foo"=>"#foo", "Bar"=>"#bar"}
Another way:
h = doc.css('a[href]').each_with_object({}) { |n, h| h[n.text.strip] = n['href'] }
# yields {"Foo"=>"#foo", "Bar"=>"#bar"}
And if you're worried that you might have the same text linking to different things then you collect the href
s in arrays:
h = doc.css('a[href]').each_with_object(Hash.new { |h,k| h[k] = [ ]}) { |n, h| h[n.text.strip] << n['href'] }
# yields {"Foo"=>["#foo"], "Bar"=>["#bar"]}