Can scrapy be used to scrape dynamic content from websites that are using AJAX?

前端未结

关注

 8  811

星月不相逢 2020-11-21 17:48

I have recently been learning Python and am dipping my hand into building a web-scraper. It\'s nothing fancy at all; its only purpose is to get the data off of a betting we

8条回答

醉话见心 (楼主)

2020-11-21 18:21
Here is a simple example of scrapy with an AJAX request. Let see the site rubin-kazan.ru.

All messages are loaded with an AJAX request. My goal is to fetch these messages with all their attributes (author, date, ...):

When I analyze the source code of the page I can't see all these messages because the web page uses AJAX technology. But I can with Firebug from Mozilla Firefox (or an equivalent tool in other browsers) to analyze the HTTP request that generate the messages on the web page:

It doesn't reload the whole page but only the parts of the page that contain messages. For this purpose I click an arbitrary number of page on the bottom:

And I observe the HTTP request that is responsible for message body:

After finish, I analyze the headers of the request (I must quote that this URL I'll extract from source page from var section, see the code below):

And the form data content of the request (the HTTP method is "Post"):

And the content of response, which is a JSON file:

Which presents all the information I'm looking for.

From now, I must implement all this knowledge in scrapy. Let's define the spider for this purpose:
```
class spider(BaseSpider):
    name = 'RubiGuesst'
    start_urls = ['http://www.rubin-kazan.ru/guestbook.html']

    def parse(self, response):
        url_list_gb_messages = re.search(r'url_list_gb_messages="(.*)"', response.body).group(1)
        yield FormRequest('http://www.rubin-kazan.ru' + url_list_gb_messages, callback=self.RubiGuessItem,
                          formdata={'page': str(page + 1), 'uid': ''})

    def RubiGuessItem(self, response):
        json_file = response.body
```
In parse function I have the response for first request. In RubiGuessItem I have the JSON file with all information.
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...