Django custom management command running Scrapy: How to include Scrapy's options?

前端 未结 2 484
时光取名叫无心
时光取名叫无心 2021-02-08 03:41

I want to be able to run the Scrapy web crawling framework from within Django. Scrapy itself only provides a command line tool scrapy to execute its commands, i.e.

相关标签:
2条回答
  • 2021-02-08 04:10

    Okay, I have found a solution to my problem. It's a bit ugly but it works. Since the Django project's manage.py command does not accept Scrapy's command line options, I split the options string into two arguments which are accepted by manage.py. After successful parsing, I rejoin the two arguments and pass them to Scrapy.

    That is, instead of writing

    python manage.py scrapy crawl domain.com -o scraped_data.json -t json
    

    I put spaces in between the options like this

    python manage.py scrapy crawl domain.com - o scraped_data.json - t json
    

    My handle function looks like this:

    def handle(self, *args, **options):
        arguments = self._argv[1:]
        for arg in arguments:
            if arg in ('-', '--'):
                i = arguments.index(arg)
                new_arg = ''.join((arguments[i], arguments[i+1]))
                del arguments[i:i+2]
                arguments.insert(i, new_arg)
    
        from scrapy.cmdline import execute
        execute(arguments)
    

    Meanwhile, Mikhail Korobov has provided the optimal solution. See here:

    # -*- coding: utf-8 -*- 
    # myapp/management/commands/scrapy.py 
    
    from __future__ import absolute_import
    from django.core.management.base import BaseCommand
    
    class Command(BaseCommand):
    
        def run_from_argv(self, argv):
            self._argv = argv
            self.execute()
    
        def handle(self, *args, **options):
            from scrapy.cmdline import execute
            execute(self._argv[1:])
    
    0 讨论(0)
  • 2021-02-08 04:35

    I think you're really looking for Guideline 10 of the POSIX argument syntax conventions:

    The argument -- should be accepted as a delimiter indicating the end of options. Any following arguments should be treated as operands, even if they begin with the '-' character. The -- argument should not be used as an option or as an operand.

    Python's optparse module behaves this way, even under windows.

    I put the scrapy project settings module in the argument list, so I can create separate scrapy projects in independent apps:

    # <app>/management/commands/scrapy.py
    from __future__ import absolute_import
    import os
    
    from django.core.management.base import BaseCommand
    
    class Command(BaseCommand):
        def handle(self, *args, **options):
            os.environ['SCRAPY_SETTINGS_MODULE'] = args[0]
            from scrapy.cmdline import execute
            # scrapy ignores args[0], requires a mutable seq
            execute(list(args))
    

    Invoked as follows:

    python manage.py scrapy myapp.scrapyproj.settings crawl domain.com -- -o scraped_data.json -t json
    

    Tested with scrapy 0.12 and django 1.3.1

    0 讨论(0)
提交回复
热议问题