发表新帖

发表新帖

Scraping an AngularJS application

前端未结

关注

 2  1490

I\'m scrapping some HTML pages with Rails, using Nokogiri.

I had some problems when I tried to scrap an AngularJS page because the gem is opening the HTML before it

相关标签:

2条回答

南方客

2021-01-07 08:23

If you're trying to scrape AngularJS pages in a fully generic fashion, then you're likely going to need something like what @tadman mentioned in the comments (PhantomJS) -- some type of headless browser that fully processes the AngularJS JavaScript and opens the DOM up to inspection afterwards.

If you have a specific site or sites that you are looking to scrape, the path of least resistance is likely to avoid the AngularJS frontend entirely and directly query the API from which the Angular code is pulling content. The standard scenario for many/most AngularJS sites is that they pull down the static JS and HTML code/templates, and then they make ajax calls back to a server (either their own, or some third party API) to get content that will be rendered. If you take a look at their code, you can likely directly query whatever angular is calling (i.e. via $http, ngResource, or restangular). The return data is typically JSON and would be much easier to gather vs. true scraping in the post-rendered html result.

0 讨论(0)
发布评论:

提交评论
- 加载中...
我在风中等你

2021-01-07 08:27
You can use:
```
require 'phantomjs'
require 'watir'

b = Watir::Browser.new(:phantomjs)
b.goto URL

doc = Nokogiri::HTML(b.html)
```
Download phantomjs in http://phantomjs.org/download.html and move the binary for /usr/bin
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题