I want to be able to run the Scrapy web crawling framework from within Django. Scrapy itself only provides a command line tool scrapy
to execute its commands, i.e.
I think you're really looking for Guideline 10 of the POSIX argument syntax conventions:
The argument -- should be accepted as a delimiter indicating the end of options. Any following arguments should be treated as operands, even if they begin with the '-' character. The -- argument should not be used as an option or as an operand.
Python's optparse
module behaves this way, even under windows.
I put the scrapy project settings module in the argument list, so I can create separate scrapy projects in independent apps:
# /management/commands/scrapy.py
from __future__ import absolute_import
import os
from django.core.management.base import BaseCommand
class Command(BaseCommand):
def handle(self, *args, **options):
os.environ['SCRAPY_SETTINGS_MODULE'] = args[0]
from scrapy.cmdline import execute
# scrapy ignores args[0], requires a mutable seq
execute(list(args))
Invoked as follows:
python manage.py scrapy myapp.scrapyproj.settings crawl domain.com -- -o scraped_data.json -t json
Tested with scrapy 0.12 and django 1.3.1