Method to parse HTML document in Ruby?

后端 未结 4 2023
抹茶落季
抹茶落季 2020-11-29 09:03

like DOMDocument class in PHP, is there any class in RUBY (i.e the core RUBY), to parse and get node elements value from a HTML Document.

相关标签:
4条回答
  • 2020-11-29 09:43

    You should check out hpricot. It's exceedingly good. It's not 'core' ruby, but it's a commonly used gem.

    0 讨论(0)
  • 2020-11-29 09:52

    There is no built-in HTML parser (yet), but some very good ones are available, in particular Nokogiri.

    Meta-answer: For common needs like these, I'd recommend checking out the Ruby Toolbox site. You'll notice that Nokogiri is the top recommendation for HTML parsers

    0 讨论(0)
  • 2020-11-29 09:52

    You can also try Oga by Yorick Peterse.

    It is an XML/HTML parser written in Ruby that does not require system libraries such as libxml. You can find it here. https://github.com/YorickPeterse/oga

    0 讨论(0)
  • 2020-11-29 09:57

    Ruby Cheerio - A jQuery style HTML parser in ruby. A most simplified version of Nokogiri for crawlers. This is the ruby version of most popular NodeJS package cheerio.

    Follow the link for a simple crawler example.

    gem install ruby-cheerio

    require 'ruby-cheerio'
    
    jQuery = RubyCheerio.new("<html><body><h1 class='one'>h1_1</h1><h1>h1_2</h1></body></html>")
    
    jQuery.find('h1').each do |head_one|
        p head_one.text
    end
    
    # getting attribute values like jQuery.
    p jQuery.find('h1.one')[0].prop('h1','class')
    
    # function chaining similar to jQuery.
    p jQuery.find('body').find('h1').first.text
    
    0 讨论(0)
提交回复
热议问题