We have a system written with scrapy to crawl a few websites. There are several spiders, and a few cascaded pipelines for all items passed
You can raise a CloseSpider exception to close down a spider. However, I don't think this will work from a pipeline.
EDIT: avaleske notes in the comments to this answer that he was able to raise a CloseSpider exception from a pipeline. Most wise would be to use this.
A similar situation has been described on the Scrapy Users group, in this thread.
I quote:
To close an spider for any part of your code you should use
engine.close_spider
method. See this extension for an usage example: https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/closespider.py#L61
You could write your own extension, whilst looking at closespider.py as an example, which will shut down a spider if a certain condition has been met.
Another "hack" would be to set a flag on the spider in the pipeline. For example:
pipeline:
def process_item(self, item, spider):
if some_flag:
spider.close_down = True
spider:
def parse(self, response):
if self.close_down:
raise CloseSpider(reason='API usage exceeded')