I have recently been learning Python and am dipping my hand into building a web-scraper. It\'s nothing fancy at all; its only purpose is to get the data off of a betting we
how can scrapy be used to scrape this dynamic data so that I can use it?
I wonder why no one has posted the solution using Scrapy only.
Check out the blog post from Scrapy team SCRAPING INFINITE SCROLLING PAGES . The example scraps http://spidyquotes.herokuapp.com/scroll website which uses infinite scrolling.
The idea is to use Developer Tools of your browser and notice the AJAX requests, then based on that information create the requests for Scrapy.
import json
import scrapy
class SpidyQuotesSpider(scrapy.Spider):
name = 'spidyquotes'
quotes_base_url = 'http://spidyquotes.herokuapp.com/api/quotes?page=%s'
start_urls = [quotes_base_url % 1]
download_delay = 1.5
def parse(self, response):
data = json.loads(response.body)
for item in data.get('quotes', []):
yield {
'text': item.get('text'),
'author': item.get('author', {}).get('name'),
'tags': item.get('tags'),
}
if data['has_next']:
next_page = data['page'] + 1
yield scrapy.Request(self.quotes_base_url % next_page)