For development purposes, I would like to stop all scrapy crawling activity as soon a first exception (in a spider or a pipeline) occurs.
Any advice?
Since 0.11, there is CLOSESPIDER_ERRORCOUNT:
An integer which specifies the maximum number of errors to receive before closing the spider. If the spider generates more than that number of errors, it will be closed with the reason closespider_errorcount. If zero (or non set), spiders won’t be closed by number of errors.
If it is set to 1
, the spider will be closed on the first exception.
its purely depends on your business logic. but this will work for you
crawler.engine.close_spider(self, 'log message')
Suggested Reading
Suggested Reading
and the worst solution is
import sys
sys.exit("SHUT DOWN EVERYTHING!")
In spider, you can just throw CloseSpider exception.
def parse_page(self, response):
if 'Bandwidth exceeded' in response.body:
raise CloseSpider('bandwidth_exceeded')
For others (middlewares, pipeline, etc), you can manually call close_spider as akhter mentioned.