scrapy passing custom_settings to spider from script using CrawlerProcess.crawl()

后端 未结 4 1530
予麋鹿
予麋鹿 2021-01-05 09:19

I am trying to programatically call a spider through a script. I an unable to override the settings through the constructor using CrawlerProcess. Let me illustrate this with

4条回答
  •  迷失自我
    2021-01-05 09:55

    I think you can't override the custom_settings variable of a Spider Class when calling it as a script, basically because the settings are being loaded before the spider is instantiated.

    Now, I don't really see a point on changing the custom_settings variable specifically, as it is only a way to override your default settings, and that's exactly what the CrawlerProcess offers too, this works as expected:

    import scrapy
    from scrapy.crawler import CrawlerProcess
    
    
    class MySpider(scrapy.Spider):
        name = 'simple'
        start_urls = ['http://httpbin.org/headers']
    
        def parse(self, response):
            for k, v in self.settings.items():
                print('{}: {}'.format(k, v))
            yield {
                'headers': response.body
            }
    
    process = CrawlerProcess({
        'USER_AGENT': 'my custom user anget',
        'ANYKEY': 'any value',
    })
    
    process.crawl(MySpider)
    process.start()
    

提交回复
热议问题