scrapyd

Scrapyd使用详解

一个人想着一个人 提交于 2019-11-29 08:57:14
Scrapyd使用详解 目录 前言 使用详解 安装 启动 项目发布 相关API使用 查看服务进程状态 项目发布版本 调度爬虫 取消任务 获取上传的项目 获取项目的版本 获取项目的爬虫列表 获取任务列表(Scrapyd 0.15版本以上) 删除项目版本 删除项目 前言 Scrapyd通常作为守护进程运行,它侦听运行爬虫的请求,并为每个请求生成一个进程,该进程基本上执行: scrapy crawl [myspider] 。 Scrapyd还并行运行多个进程,将它们分配到 max_proc 和 max_proc_per_cpu 选项提供的固定数量的插槽中,启动尽可能多的进程来处理负载。 除了调度和管理进程之外,Scrapyd还提供了一个JSON web服务来上载新的项目版本(作为egg)和调度爬虫。 Scrapyd官方文档 https://scrapyd.readthedocs.io/en/latest/index.html 划重点 :通过api方式多进程执行请求,在网页端查看正在执行的任务,也能新建爬虫任务,和终止爬虫任务。 使用详解 安装 pip install scrapyd 依赖的库及版本: Python 2.7 or above Twisted 8.0 or above Scrapy 1.0 or above six 启动 在项目目录下,输入 scrapyd 即可运行

Scrapy 's Scrapyd too slow with scheduling spiders

有些话、适合烂在心里 提交于 2019-11-28 19:01:33
I am running Scrapyd and encounter a weird issue when launching 4 spiders at the same time. 2012-02-06 15:27:17+0100 [HTTPChannel,0,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27:16 +0000] "POST /schedule.json HTTP/1.1" 200 62 "-" "python-requests/0.10.1" 2012-02-06 15:27:17+0100 [HTTPChannel,1,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27:16 +0000] "POST /schedule.json HTTP/1.1" 200 62 "-" "python-requests/0.10.1" 2012-02-06 15:27:17+0100 [HTTPChannel,2,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27:16 +0000] "POST /schedule.json HTTP/1.1" 200 62 "-" "python-requests/0.10.1" 2012-02-06 15:27:17+0100

Run multiple scrapy spiders at once using scrapyd

回眸只為那壹抹淺笑 提交于 2019-11-27 17:45:39
I'm using scrapy for a project where I want to scrape a number of sites - possibly hundreds - and I have to write a specific spider for each site. I can schedule one spider in a project deployed to scrapyd using: curl http://localhost:6800/schedule.json -d project=myproject -d spider=spider2 But how do I schedule all spiders in a project at once? All help much appreciated! dru My solution for running 200+ spiders at once has been to create a custom command for the project. See http://doc.scrapy.org/en/latest/topics/commands.html#custom-project-commands for more information about implementing

Scrapy 's Scrapyd too slow with scheduling spiders

我只是一个虾纸丫 提交于 2019-11-27 12:06:53
问题 I am running Scrapyd and encounter a weird issue when launching 4 spiders at the same time. 2012-02-06 15:27:17+0100 [HTTPChannel,0,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27:16 +0000] "POST /schedule.json HTTP/1.1" 200 62 "-" "python-requests/0.10.1" 2012-02-06 15:27:17+0100 [HTTPChannel,1,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27:16 +0000] "POST /schedule.json HTTP/1.1" 200 62 "-" "python-requests/0.10.1" 2012-02-06 15:27:17+0100 [HTTPChannel,2,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27

Run multiple scrapy spiders at once using scrapyd

时光总嘲笑我的痴心妄想 提交于 2019-11-26 22:36:45
问题 I'm using scrapy for a project where I want to scrape a number of sites - possibly hundreds - and I have to write a specific spider for each site. I can schedule one spider in a project deployed to scrapyd using: curl http://localhost:6800/schedule.json -d project=myproject -d spider=spider2 But how do I schedule all spiders in a project at once? All help much appreciated! 回答1: My solution for running 200+ spiders at once has been to create a custom command for the project. See http://doc