CrawlSpider with Splash

旧巷老猫 提交于 2019-12-01 06:34:58

A quick glance, you're not calling your start_request property using splash... For example, you should be using SplashRequest.

def start_requests(self):
    for url in self.start_urls:
        yield SplahRequest(url, self.parse, meta={
            'splash': {
                'endpoint': 'render.html',
                'args': {'wait': 0.5}
            }
        })

Giving that you have Splash set up appropriate, that is in settings you have enabled the necessary middle where's and pointed to the correct /url also enabled them to fire and HTTP cache all correctly... No I have not run your code should be good to go now

EDIT: BTW... its not next page is not js generated

So... unless there is any other reason your using splash I see no reason to use it a simple for loop in the initial parsing of the articles request like...

for next in response.css("a.control-nav-next::attr(href)").extract():
    yield scrapy.Request(response.urljoin(next), callback=self.parse...
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!