Can Nokogiri interpret javascript? - Web Scraping

后端 未结 2 1758
旧巷少年郎
旧巷少年郎 2021-01-14 00:14

We are trying to scrape the availabilities on this page: http://www.equityapartments.com/new-york/new-york-city-apartments/midtown-west/mantena-apartments.aspx

I nee

相关标签:
2条回答
  • 2021-01-14 00:26

    Nokogiri is just a parser. It also allows to search content.

    To interact with web pages you need to use something else, e.g. Watir and PhantomJS.

    Combining them all together:

    browser = Watir::Browser.new(:phantomjs)
    
    browser.goto(your_url_above)
    browser.link(text: 'All floorplans').click
    
    document = Nokogiri::HTML(browser.html)
    document.search(...)
    
    0 讨论(0)
  • 2021-01-14 00:44

    Yes, you can do it if the Floorplans have an id/class. You can get those from your page.

    You will be needing firepath to help you get the XPath of the elements and then you can iterate them using it. For example, recently I worked on webpagescraper to scrape HTML from fundly.com.

    To get all titles, as all titles elements in the HTML had the same class, I was able to get EVERY title on https://fundly.com/search/%60 using that XPath with the class name like:

    require 'rubygems'
    require 'nokogiri'
    require 'open-uri'
    
    doc.search('h4.f-width-100').each do |title|
       @campaign_titles <<  title.text
    end  
    

    Please refer to my above project if you need any more assistance to grab the values from any website.

    0 讨论(0)
提交回复
热议问题