How can I make scrapy crawl break and exit when encountering the first exception?

前端 未结 3 899
伪装坚强ぢ
伪装坚强ぢ 2020-12-14 02:20

For development purposes, I would like to stop all scrapy crawling activity as soon a first exception (in a spider or a pipeline) occurs.

Any advice?

相关标签:
3条回答
  • 2020-12-14 02:48

    Since 0.11, there is CLOSESPIDER_ERRORCOUNT:

    An integer which specifies the maximum number of errors to receive before closing the spider. If the spider generates more than that number of errors, it will be closed with the reason closespider_errorcount. If zero (or non set), spiders won’t be closed by number of errors.

    If it is set to 1, the spider will be closed on the first exception.

    0 讨论(0)
  • 2020-12-14 02:49

    its purely depends on your business logic. but this will work for you

    crawler.engine.close_spider(self, 'log message')
    

    Suggested Reading

    Suggested Reading

    and the worst solution is

    import sys
    
    sys.exit("SHUT DOWN EVERYTHING!")
    
    0 讨论(0)
  • 2020-12-14 02:52

    In spider, you can just throw CloseSpider exception.

    def parse_page(self, response):
        if 'Bandwidth exceeded' in response.body:
            raise CloseSpider('bandwidth_exceeded')
    

    For others (middlewares, pipeline, etc), you can manually call close_spider as akhter mentioned.

    0 讨论(0)
提交回复
热议问题