Running Multiple spiders in scrapy

前端未结

关注

 4  1375

滥情空心

In scrapy for example if i had two URL\'s that contains different HTML. Now i want to write two individual spiders each for one and want to run both the spiders at

相关标签:

4条回答

执笔经年

2021-01-05 05:10

You should use scrapyd to handle multiple crawler http://doc.scrapy.org/en/latest/topics/scrapyd.html

0 讨论(0)
发布评论:

提交评论
- 加载中...

无人共我

2021-01-05 05:23

Here the code that allow you to run multiple spiders in scrapy. Save this code at the same directory with scrapy.cfg (My scrapy version is 1.3.3 and it works) :

from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess

setting = get_project_settings()
process = CrawlerProcess(setting)

for spider_name in process.spiders.list():
    print ("Running spider %s" % (spider_name))
    process.crawl(spider_name,query="dvh") #query dvh is custom argument used in your scrapy

process.start()

and then you can schedule this python program to run with cronjob.

0 讨论(0)

悲哀的现实

2021-01-05 05:27
It would probably be easiest to just run two scrapy scripts at once from the OS level. They should both be able to save to the same database. Create a shell script to call both scrapy scripts to do them at the same time:
```
scrapy runspider foo &
scrapy runspider bar
```
Be sure to make this script executable with chmod +x script_name

To schedule a cronjob every 6 hours, type crontab -e into your terminal, and edit the file as follows:
```
* */6 * * * path/to/shell/script_name >> path/to/file.log
```
The first * is minutes, then hours, etc., and an asterik is a wildcard. So this says run the script at any time where the hours is divisible by 6, or every six hours.
0 讨论(0)
发布评论:

提交评论
- 加载中...

太阳男子

2021-01-05 05:27

You can try using CrawlerProcess

from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess

from myproject.spiders import spider1, spider2

1Spider = spider1.1Spider()
2Spider = spider2.2Spider()
process = CrawlerProcess(get_project_settings())
process.crawl(1Spider)
process.crawl(2Spider)
process.start()

If you want to see the full log of the crawl, set LOG_FILE in your settings.py.

LOG_FILE = "logs/mylog.log"

0 讨论(0)