发表新帖

发表新帖

How can I make scrapy crawl break and exit when encountering the first exception?

前端未结

关注

 3  899

伪装坚强ぢ

For development purposes, I would like to stop all scrapy crawling activity as soon a first exception (in a spider or a pipeline) occurs.

Any advice?

相关标签:

3条回答

盖世英雄少女心

2020-12-14 02:48

Since 0.11, there is CLOSESPIDER_ERRORCOUNT:

An integer which specifies the maximum number of errors to receive before closing the spider. If the spider generates more than that number of errors, it will be closed with the reason closespider_errorcount. If zero (or non set), spiders won’t be closed by number of errors.

If it is set to 1, the spider will be closed on the first exception.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一整个雨季

2020-12-14 02:49
its purely depends on your business logic. but this will work for you
```
crawler.engine.close_spider(self, 'log message')
```
Suggested Reading

Suggested Reading

and the worst solution is
```
import sys

sys.exit("SHUT DOWN EVERYTHING!")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
情书的邮戳

2020-12-14 02:52
In spider, you can just throw CloseSpider exception.
```
def parse_page(self, response):
    if 'Bandwidth exceeded' in response.body:
        raise CloseSpider('bandwidth_exceeded')
```
For others (middlewares, pipeline, etc), you can manually call close_spider as akhter mentioned.
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题