How to manipulate DOM with Ruby on Rails

前端 未结 3 1822
逝去的感伤
逝去的感伤 2021-02-03 14:40

As the title said, I have some DOM manipulation tasks. For example, I want to: - find all H1 element which have blue color. - find all text which have size 12px. - etc..

相关标签:
3条回答
  • 2021-02-03 15:23

    To reliably sort out what color an arbitrary element on a webpage is, you would need to reverse engineer a browser (to accurately take into account stylesheets, markup hacks, broken tags, images, etc).

    A far easier approach would be to embed an existing browser such as gecko into a custom application of your making.

    As your spider would browse pages, it would pass them to your embedded instance of gecko where you could use getComputedStyle to pull what color an individual element happens to be.

    You originally mentioned wanting to use Ruby on Rails for this project, Rails is a framework for writing presentational applications and really a bad fit for a project like this.

    As a starting point, I'd recommend you check out RubyGnome, and in particular RubyGnome's Gtk::MozEmbed functionality.

    0 讨论(0)
  • 2021-02-03 15:33

    If what you're trying to do is manipulate HTML documents inside a rails application, you should take a look at Nokogiri.

    It uses XPath to search through the document. With the following, you would find any h1 with the "blue" css class inside a document.

    require 'nokogiri'
    require 'open-uri'
    
    doc = Nokogiri::HTML(open('http://www.stackoverflow.com'))
    doc.xpath('//h1/a[@class="blue"]').each do |link|
        puts link.content
    end
    

    After, if what you were trying to do was indeed parse the current page dom, you should take a look at JavaScript and JQuery. Rails can't do that.

    0 讨论(0)
  • 2021-02-03 15:35

    http://railscasts.com/episodes/190-screen-scraping-with-nokogiri

    0 讨论(0)
提交回复
热议问题