Running more than one spiders one by one

馋奶兔 提交于 2019-12-14 03:12:41

问题


I am using Scrapy framework to make spiders crawl through some webpages. Basically, what I want is to scrap web pages and save them to database. I have one spider per webpage. But I am having trouble to run those spiders at once such that a spider starts to crawl exactly after another spiders finishes crawling. How can that be achieved? Is scrapyd the solution?


回答1:


scrapyd is indeed a good way to go, max_proc or max_proc_per_cpu configuration can be used to restrict the number of parallel spdiers, you will then schedule spiders using scrapyd rest api like:

$ curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider


来源:https://stackoverflow.com/questions/21694386/running-more-than-one-spiders-one-by-one

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!