Scrapy: how to catch download error and try download it again

后端 未结 2 1044
孤街浪徒
孤街浪徒 2021-02-09 05:33

During my crawling, some pages failed due to unexpected redirection and no response returned. How can I catch this kind of error and re-schedule a request with original url, not

相关标签:
2条回答
  • 2021-02-09 06:02

    You could pass a lambda as an errback:

    request = Request(url, dont_filter=True,callback = self.parse, errback = lambda x: self.download_errback(x, url))
    

    that way you'll have access to the url inside the errback function:

    def download_errback(self, e, url):
        print url
    
    0 讨论(0)
  • 2021-02-09 06:09

    you can override the RETRY_HTTP_CODES in settings.py

    This is the settings I use for proxy errors:

    RETRY_HTTP_CODES = [500, 502, 503, 504, 400, 403, 404, 408] 
    
    0 讨论(0)
提交回复
热议问题