I have recently been learning Python and am dipping my hand into building a web-scraper. It\'s nothing fancy at all; its only purpose is to get the data off of a betting we
yes, Scrapy can scrap dynamic websites, website that are rendered through javaScript.
There are Two approaches to scrapy these kind of websites.
First,
you can use splash
to render Javascript code and then parse the rendered HTML.
you can find the doc and project here Scrapy splash, git
Second,
As everyone is stating, by monitoring the network calls
, yes, you can find the api call that fetch the data and mock that call in your scrapy spider might help you to get desired data.
how can scrapy be used to scrape this dynamic data so that I can use it?
I wonder why no one has posted the solution using Scrapy only.
Check out the blog post from Scrapy team SCRAPING INFINITE SCROLLING PAGES . The example scraps http://spidyquotes.herokuapp.com/scroll website which uses infinite scrolling.
The idea is to use Developer Tools of your browser and notice the AJAX requests, then based on that information create the requests for Scrapy.
import json
import scrapy
class SpidyQuotesSpider(scrapy.Spider):
name = 'spidyquotes'
quotes_base_url = 'http://spidyquotes.herokuapp.com/api/quotes?page=%s'
start_urls = [quotes_base_url % 1]
download_delay = 1.5
def parse(self, response):
data = json.loads(response.body)
for item in data.get('quotes', []):
yield {
'text': item.get('text'),
'author': item.get('author', {}).get('name'),
'tags': item.get('tags'),
}
if data['has_next']:
next_page = data['page'] + 1
yield scrapy.Request(self.quotes_base_url % next_page)