问题
There. I am trying to scrape a website. Everything is working fine, the problem is that I cannot figure out how to scrape the ajax contents. The website I am scraping uses ajax content to get review pages using Post request. Here is what chrome dev tool say.
Chrome Dev tool
I researched a lot but I cannot understand how to scrape ajax contents. I know about form data and post or get request but I cannot use them. Moreover, I don't know how to scrape the content I need. I guess it cannot be scraped using the XPath or selectors. Moreover, if you check the URL,in the review section there is read more button, is it possible to scrape it using the same strategy as for ajax content.
I was able to scrape the first page but I am stuck at next_page. this is how spider terminates, it gets the url for next page, requests but nothing happens. Output log of spider Here is the code...
import scrapy
from scrapy.http import Request, FormRequest
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from quo.items import QuoItem
class MySpider(scrapy.Spider):
name = 'quotes'
def start_requests(self):
yield scrapy.Request('https://www.daraz.pk/infinix-s2-pro-32gb-3gb-4g-lte-black-6619437.html', self.parse)
def parse(self, response):
for href in response.xpath('//div[@class="reviews"]'):
item=QuoItem()
Rating=response.xpath('//*[@id="ratingReviews"]/section[3]/div[2]/article/div[2]/div[1]/div/div/@style').extract()
if Rating:
item['Rating']=Rating
ReviewT=response.xpath('//*[@id="ratingReviews"]/section[3]/div[2]/article/div[2]/div[2]/text()').extract()
if ReviewT:
item['ReviewT']=ReviewT
yield item
next_page=response.xpath('(//ul[@class="osh-pagination -horizontal"]/li[@class="item"]/a[@title]/@href)[last()]').extract() #xpath for next button which contains the url.
if next_page:
yield scrapy.Request(response.urljoin(next_page[0]), callback=self.parse)
Update requested in comments:
I have tried to use it, but I guess I didn't use it good. It isn't doing anything. Here is the additon to the code
next_page=response.xpath('(//ul[@class="osh-pagination -horizontal"]/li[@class="item"]/a[@title]/@href)[last()]').extract()
if next_page:
yield scrapy.Request(response.urljoin(next_page[0]), callback=self.parse_jsonloads)
def parse_jsonloads(self, response):
data=json.loads(response.body)
for item in data.get('reviews', []):
ReviewT=item.get('author')
yield json.loads(response.body_as_unicode())
来源:https://stackoverflow.com/questions/44856174/scraping-ajax-based-review-page-with-scrapy