In scrapy for example if i had two URL\'s that contains different HTML. Now i want to write two individual spiders each for one and want to run both the spiders at
You should use scrapyd to handle multiple crawler http://doc.scrapy.org/en/latest/topics/scrapyd.html
Here the code that allow you to run multiple spiders in scrapy. Save this code at the same directory with scrapy.cfg (My scrapy version is 1.3.3 and it works) :
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess
setting = get_project_settings()
process = CrawlerProcess(setting)
for spider_name in process.spiders.list():
print ("Running spider %s" % (spider_name))
process.crawl(spider_name,query="dvh") #query dvh is custom argument used in your scrapy
process.start()
and then you can schedule this python program to run with cronjob.
It would probably be easiest to just run two scrapy scripts at once from the OS level. They should both be able to save to the same database. Create a shell script to call both scrapy scripts to do them at the same time:
scrapy runspider foo &
scrapy runspider bar
Be sure to make this script executable with chmod +x script_name
To schedule a cronjob every 6 hours, type crontab -e
into your terminal, and edit the file as follows:
* */6 * * * path/to/shell/script_name >> path/to/file.log
The first * is minutes, then hours, etc., and an asterik is a wildcard. So this says run the script at any time where the hours is divisible by 6, or every six hours.
You can try using CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess
from myproject.spiders import spider1, spider2
1Spider = spider1.1Spider()
2Spider = spider2.2Spider()
process = CrawlerProcess(get_project_settings())
process.crawl(1Spider)
process.crawl(2Spider)
process.start()
If you want to see the full log of the crawl, set LOG_FILE
in your settings.py
.
LOG_FILE = "logs/mylog.log"