CrawlSpider with Splash

后端未结

关注

 1  1466

I have some problem with my spider. I use splash with scrapy to get link to \"Next page\" which is generate by JavaScript. After downloading the information from the first p

相关标签:

1条回答

再見小時候

2021-01-13 14:43
A quick glance, you're not calling your start_request property using splash... For example, you should be using SplashRequest.
```
def start_requests(self):
    for url in self.start_urls:
        yield SplahRequest(url, self.parse, meta={
            'splash': {
                'endpoint': 'render.html',
                'args': {'wait': 0.5}
            }
        })
```
Giving that you have Splash set up appropriate, that is in settings you have enabled the necessary middle where's and pointed to the correct /url also enabled them to fire and HTTP cache all correctly... No I have not run your code should be good to go now

EDIT: BTW... its not next page is not js generated

So... unless there is any other reason your using splash I see no reason to use it a simple for loop in the initial parsing of the articles request like...
```
for next in response.css("a.control-nav-next::attr(href)").extract():
    yield scrapy.Request(response.urljoin(next), callback=self.parse...
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

CrawlSpider with Splash

EDIT: BTW... its not next page is not js generated