Scraping Ajax based Review Page with Scrapy

问题

There. I am trying to scrape a website. Everything is working fine, the problem is that I cannot figure out how to scrape the ajax contents. The website I am scraping uses ajax content to get review pages using Post request. Here is what chrome dev tool say.

Chrome Dev tool

I researched a lot but I cannot understand how to scrape ajax contents. I know about form data and post or get request but I cannot use them. Moreover, I don't know how to scrape the content I need. I guess it cannot be scraped using the XPath or selectors. Moreover, if you check the URL,in the review section there is read more button, is it possible to scrape it using the same strategy as for ajax content.

I was able to scrape the first page but I am stuck at next_page. this is how spider terminates, it gets the url for next page, requests but nothing happens. Output log of spider Here is the code...

import scrapy
from scrapy.http import Request, FormRequest
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from quo.items import QuoItem

class MySpider(scrapy.Spider):
    name = 'quotes'


    def start_requests(self):
        yield scrapy.Request('https://www.daraz.pk/infinix-s2-pro-32gb-3gb-4g-lte-black-6619437.html', self.parse)


    def parse(self, response):
         for href in response.xpath('//div[@class="reviews"]'):
          item=QuoItem()


          Rating=response.xpath('//*[@id="ratingReviews"]/section[3]/div[2]/article/div[2]/div[1]/div/div/@style').extract()
          if Rating:

              item['Rating']=Rating


          ReviewT=response.xpath('//*[@id="ratingReviews"]/section[3]/div[2]/article/div[2]/div[2]/text()').extract()
          if ReviewT:
              item['ReviewT']=ReviewT

          yield item

          next_page=response.xpath('(//ul[@class="osh-pagination -horizontal"]/li[@class="item"]/a[@title]/@href)[last()]').extract() #xpath for next button which contains the url.
          if next_page:

                       yield scrapy.Request(response.urljoin(next_page[0]), callback=self.parse)

Update requested in comments:
I have tried to use it, but I guess I didn't use it good. It isn't doing anything. Here is the additon to the code

next_page=response.xpath('(//ul[@class="osh-pagination -horizontal"]/li[@class="item"]/a[@title]/@href)[last()]').e‌xtract() 
if next_page: 
    yield scrapy.Request(response.urljoin(next_page[0]), callback=self.parse_jsonloads) 

def parse_jsonloads(self, response): 
    data=json.loads(response.body) 

    for item in data.get('reviews', []): 
        ReviewT=item.get('author') 

    yield json.loads(response.body_as_unicode())

来源：https://stackoverflow.com/questions/44856174/scraping-ajax-based-review-page-with-scrapy

标签

python

ajax

scrapy

scrapy-spider