Scraping an AngularJS application

前端 未结 2 1490
后悔当初
后悔当初 2021-01-07 07:50

I\'m scrapping some HTML pages with Rails, using Nokogiri.

I had some problems when I tried to scrap an AngularJS page because the gem is opening the HTML before it

相关标签:
2条回答
  • 2021-01-07 08:23

    If you're trying to scrape AngularJS pages in a fully generic fashion, then you're likely going to need something like what @tadman mentioned in the comments (PhantomJS) -- some type of headless browser that fully processes the AngularJS JavaScript and opens the DOM up to inspection afterwards.

    If you have a specific site or sites that you are looking to scrape, the path of least resistance is likely to avoid the AngularJS frontend entirely and directly query the API from which the Angular code is pulling content. The standard scenario for many/most AngularJS sites is that they pull down the static JS and HTML code/templates, and then they make ajax calls back to a server (either their own, or some third party API) to get content that will be rendered. If you take a look at their code, you can likely directly query whatever angular is calling (i.e. via $http, ngResource, or restangular). The return data is typically JSON and would be much easier to gather vs. true scraping in the post-rendered html result.

    0 讨论(0)
  • 2021-01-07 08:27

    You can use:

    require 'phantomjs'
    require 'watir'
    
    b = Watir::Browser.new(:phantomjs)
    b.goto URL
    
    doc = Nokogiri::HTML(b.html)
    

    Download phantomjs in http://phantomjs.org/download.html and move the binary for /usr/bin

    0 讨论(0)
提交回复
热议问题