I have some problem with my spider. I use splash with scrapy to get link to \"Next page\" which is generate by JavaScript. After downloading the information from the first p
A quick glance, you're not calling your start_request property using splash... For example, you should be using SplashRequest.
def start_requests(self):
for url in self.start_urls:
yield SplahRequest(url, self.parse, meta={
'splash': {
'endpoint': 'render.html',
'args': {'wait': 0.5}
}
})
Giving that you have Splash set up appropriate, that is in settings you have enabled the necessary middle where's and pointed to the correct /url also enabled them to fire and HTTP cache all correctly... No I have not run your code should be good to go now
So... unless there is any other reason your using splash I see no reason to use it a simple for loop in the initial parsing of the articles request like...
for next in response.css("a.control-nav-next::attr(href)").extract():
yield scrapy.Request(response.urljoin(next), callback=self.parse...