Scraping Ajax based Review Page with Scrapy

左心房为你撑大大i 提交于 2019-12-13 00:32:22

问题


There. I am trying to scrape a website. Everything is working fine, the problem is that I cannot figure out how to scrape the ajax contents. The website I am scraping uses ajax content to get review pages using Post request. Here is what chrome dev tool say.

Chrome Dev tool

I researched a lot but I cannot understand how to scrape ajax contents. I know about form data and post or get request but I cannot use them. Moreover, I don't know how to scrape the content I need. I guess it cannot be scraped using the XPath or selectors. Moreover, if you check the URL,in the review section there is read more button, is it possible to scrape it using the same strategy as for ajax content.

I was able to scrape the first page but I am stuck at next_page. this is how spider terminates, it gets the url for next page, requests but nothing happens. Output log of spider Here is the code...

import scrapy
from scrapy.http import Request, FormRequest
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from quo.items import QuoItem

class MySpider(scrapy.Spider):
    name = 'quotes'


    def start_requests(self):
        yield scrapy.Request('https://www.daraz.pk/infinix-s2-pro-32gb-3gb-4g-lte-black-6619437.html', self.parse)


    def parse(self, response):
         for href in response.xpath('//div[@class="reviews"]'):
          item=QuoItem()


          Rating=response.xpath('//*[@id="ratingReviews"]/section[3]/div[2]/article/div[2]/div[1]/div/div/@style').extract()
          if Rating:

              item['Rating']=Rating


          ReviewT=response.xpath('//*[@id="ratingReviews"]/section[3]/div[2]/article/div[2]/div[2]/text()').extract()
          if ReviewT:
              item['ReviewT']=ReviewT

          yield item

          next_page=response.xpath('(//ul[@class="osh-pagination -horizontal"]/li[@class="item"]/a[@title]/@href)[last()]').extract() #xpath for next button which contains the url.
          if next_page:

                       yield scrapy.Request(response.urljoin(next_page[0]), callback=self.parse)

Update requested in comments:
I have tried to use it, but I guess I didn't use it good. It isn't doing anything. Here is the additon to the code

next_page=response.xpath('(//ul[@class="osh-pagination -horizontal"]/li[@class="item"]/a[@title]/@href)[last()]').e‌​xtract() 
if next_page: 
    yield scrapy.Request(response.urljoin(next_page[0]), callback=self.parse_jsonloads) 

def parse_jsonloads(self, response): 
    data=json.loads(response.body) 

    for item in data.get('reviews', []): 
        ReviewT=item.get('author') 

    yield json.loads(response.body_as_unicode())

来源:https://stackoverflow.com/questions/44856174/scraping-ajax-based-review-page-with-scrapy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!