Get link and href text from html doc with Nokogiri & Ruby?

后端 未结 2 868
广开言路
广开言路 2020-12-28 11:02

I\'m trying to use the nokogiri gem to extract all the urls on the page as well their link text and store the link text and url in a hash.


    &         


        
相关标签:
2条回答
  • 2020-12-28 11:30

    Here's a one-liner:

    Hash[doc.xpath('//a[@href]').map {|link| [link.text.strip, link["href"]]}]
    
    #=> {"Foo"=>"#foo", "Bar"=>"#bar"}
    

    Split up a bit to be arguably more readable:

    h = {}
    doc.xpath('//a[@href]').each do |link|
      h[link.text.strip] = link['href']
    end
    puts h
    
    #=> {"Foo"=>"#foo", "Bar"=>"#bar"}
    
    0 讨论(0)
  • 2020-12-28 11:48

    Another way:

    h = doc.css('a[href]').each_with_object({}) { |n, h| h[n.text.strip] = n['href'] }
    # yields {"Foo"=>"#foo", "Bar"=>"#bar"}
    

    And if you're worried that you might have the same text linking to different things then you collect the hrefs in arrays:

    h = doc.css('a[href]').each_with_object(Hash.new { |h,k| h[k] = [ ]}) { |n, h| h[n.text.strip] << n['href'] }
    # yields {"Foo"=>["#foo"], "Bar"=>["#bar"]}
    
    0 讨论(0)
提交回复
热议问题