Can scrapy be used to scrape dynamic content from websites that are using AJAX?

前端未结

关注

 8  785

I have recently been learning Python and am dipping my hand into building a web-scraper. It\'s nothing fancy at all; its only purpose is to get the data off of a betting we

相关标签:

8条回答

[愿得一人]

2020-11-21 18:31

yes, Scrapy can scrap dynamic websites, website that are rendered through javaScript.

There are Two approaches to scrapy these kind of websites.

First,

you can use splash to render Javascript code and then parse the rendered HTML. you can find the doc and project here Scrapy splash, git

Second,

As everyone is stating, by monitoring the network calls, yes, you can find the api call that fetch the data and mock that call in your scrapy spider might help you to get desired data.

0 讨论(0)
发布评论:

提交评论
- 加载中...

野性不改

2020-11-21 18:43

how can scrapy be used to scrape this dynamic data so that I can use it?

I wonder why no one has posted the solution using Scrapy only.

Check out the blog post from Scrapy team SCRAPING INFINITE SCROLLING PAGES . The example scraps http://spidyquotes.herokuapp.com/scroll website which uses infinite scrolling.

The idea is to use Developer Tools of your browser and notice the AJAX requests, then based on that information create the requests for Scrapy.

import json
import scrapy


class SpidyQuotesSpider(scrapy.Spider):
    name = 'spidyquotes'
    quotes_base_url = 'http://spidyquotes.herokuapp.com/api/quotes?page=%s'
    start_urls = [quotes_base_url % 1]
    download_delay = 1.5

    def parse(self, response):
        data = json.loads(response.body)
        for item in data.get('quotes', []):
            yield {
                'text': item.get('text'),
                'author': item.get('author', {}).get('name'),
                'tags': item.get('tags'),
            }
        if data['has_next']:
            next_page = data['page'] + 1
            yield scrapy.Request(self.quotes_base_url % next_page)

0 讨论(0)

上一页 1 2