Fastest service for crawling web pages or invoking APIs (iTunes in particular)?

让人想犯罪 __ 提交于 2019-12-11 13:53:20

问题


We need to download metadata for all iOS apps on a daily basis. We plan on extracting the information by crawling the iTunes website and by using the iTunes search API. Since there are 700K+ apps, we need an efficient way to do this.

One approach is to set up a bunch of scripts on EC2 and run them in parallel. Before we embark down this path, are there services like 80legs that people have used to accomplish a similar task? Essentially, we want something to help us crawl hundreds of thousands of pages (or make a bunch of API calls) very fast.


回答1:


You might want to look into Apple's Enterprise Partner Feed (EPF). It will probably be much cheaper than getting a bunch of EC2 machines or building up the crawling infrastructure to scrape the data. From the EFP description itself:

The Enterprise Partner Feed is a data feed of the complete set of metadata from iTunes and the App Store. It is available for affiliate partners to fully incorporate aspects of the iTunes and App Store catalogs into a web site or app.

EPF has two feed modes

iTunes generates the EPF data in two modes:

full mode
incremental mode

The full export is generated weekly and contains a complete snapshot of iTunes metadata as of the day of generation. The incremental export is generated daily and contains records that have been added or modified since the last full export. The incremental exports are located relative to the full export on which they are based.

Obviously, you'd use the full mode when you want to populate your data, then you would use the incremental one for the daily updates.

Good luck.



来源:https://stackoverflow.com/questions/14988664/fastest-service-for-crawling-web-pages-or-invoking-apis-itunes-in-particular

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!