Scrapy: how to catch download error and try download it again

后端未结

关注

 2  1053

During my crawling, some pages failed due to unexpected redirection and no response returned. How can I catch this kind of error and re-schedule a request with original url, not

相关标签:

2条回答

小鲜肉

2021-02-09 06:02
You could pass a lambda as an errback:
```
request = Request(url, dont_filter=True,callback = self.parse, errback = lambda x: self.download_errback(x, url))
```
that way you'll have access to the url inside the errback function:
```
def download_errback(self, e, url):
    print url
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2021-02-09 06:09
you can override the RETRY_HTTP_CODES in settings.py

This is the settings I use for proxy errors:
```
RETRY_HTTP_CODES = [500, 502, 503, 504, 400, 403, 404, 408] 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...