CrawlSpider with Splash

后端 未结 1 1463
独厮守ぢ
独厮守ぢ 2021-01-13 14:28

I have some problem with my spider. I use splash with scrapy to get link to \"Next page\" which is generate by JavaScript. After downloading the information from the first p

相关标签:
1条回答
  • 2021-01-13 14:43

    A quick glance, you're not calling your start_request property using splash... For example, you should be using SplashRequest.

    def start_requests(self):
        for url in self.start_urls:
            yield SplahRequest(url, self.parse, meta={
                'splash': {
                    'endpoint': 'render.html',
                    'args': {'wait': 0.5}
                }
            })
    

    Giving that you have Splash set up appropriate, that is in settings you have enabled the necessary middle where's and pointed to the correct /url also enabled them to fire and HTTP cache all correctly... No I have not run your code should be good to go now

    EDIT: BTW... its not next page is not js generated

    So... unless there is any other reason your using splash I see no reason to use it a simple for loop in the initial parsing of the articles request like...

    for next in response.css("a.control-nav-next::attr(href)").extract():
        yield scrapy.Request(response.urljoin(next), callback=self.parse...
    
    0 讨论(0)
提交回复
热议问题