I am trying to programatically call a spider through a script. I an unable to override the settings through the constructor using CrawlerProcess. Let me illustrate this with
I think you can't override the custom_settings
variable of a Spider Class when calling it as a script, basically because the settings are being loaded before the spider is instantiated.
Now, I don't really see a point on changing the custom_settings
variable specifically, as it is only a way to override your default settings, and that's exactly what the CrawlerProcess
offers too, this works as expected:
import scrapy
from scrapy.crawler import CrawlerProcess
class MySpider(scrapy.Spider):
name = 'simple'
start_urls = ['http://httpbin.org/headers']
def parse(self, response):
for k, v in self.settings.items():
print('{}: {}'.format(k, v))
yield {
'headers': response.body
}
process = CrawlerProcess({
'USER_AGENT': 'my custom user anget',
'ANYKEY': 'any value',
})
process.crawl(MySpider)
process.start()