Scrapy: Limit the number of request or request bytes

后端未结

关注

 1  2075

春和景丽 2021-02-14 05:28

I am using a scrapy CrawlSpider and defined a twisted reactor to control my crawler. During the tests I crawled a news site collecting more than several GBs of data

1条回答

不知归路 (楼主)

2021-02-14 06:33

In scrapy there is the class scrapy.extensions.closespider.CloseSpider. You can define the variables CLOSESPIDER_TIMEOUT, CLOSESPIDER_ITEMCOUNT, CLOSESPIDER_PAGECOUNT and CLOSESPIDER_ERRORCOUNT.

The spider closes automatically when the criteria is met: http://doc.scrapy.org/en/latest/topics/extensions.html#module-scrapy.extensions.closespider

0 讨论(0)
发布评论:

提交评论
- 加载中...