I want to enable some http-proxy for some spiders, and disable them for other spiders.
Can I do something like this?
# settings.py
proxy_spiders = [
You can add setting.overrides within the spider.py file Example that works:
from scrapy.conf import settings
settings.overrides['DOWNLOAD_TIMEOUT'] = 300
For you, something like this should also work
from scrapy.conf import settings
settings.overrides['DOWNLOADER_MIDDLEWARES'] = {
'myproject.middlewares.RandomUserAgentMiddleware': 400,
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
}
There is a new and easier way to do this.
class MySpider(scrapy.Spider):
name = 'myspider'
custom_settings = {
'SOME_SETTING': 'some value',
}
I use Scrapy 1.3.1
Why not use two projects rather than only one?
Let's name these two projects with proj1
and proj2
. In proj1
's settings.py
, put these settings:
HTTP_PROXY = 'http://127.0.0.1:8123'
DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.RandomUserAgentMiddleware': 400,
'myproject.middlewares.ProxyMiddleware': 410,
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
}
In proj2
's settings.py
, put these settings:
DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.RandomUserAgentMiddleware': 400,
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
}
You can define your own proxy middleware, something straightforward like this:
from scrapy.contrib.downloadermiddleware import HttpProxyMiddleware
class ConditionalProxyMiddleware(HttpProxyMiddleware):
def process_request(self, request, spider):
if getattr(spider, 'use_proxy', None):
return super(ConditionalProxyMiddleware, self).process_request(request, spider)
Then define the attribute use_proxy = True
in the spiders that you want to have the proxy enabled. Don't forget to disable the default proxy middleware and enable your modified one.
a bit late, but since release 1.0.0 there is a new feature in scrapy where you can override settings per spider like this:
class MySpider(scrapy.Spider):
name = "my_spider"
custom_settings = {"HTTP_PROXY":'http://127.0.0.1:8123',
"DOWNLOADER_MIDDLEWARES": {'myproject.middlewares.RandomUserAgentMiddleware': 400,
'myproject.middlewares.ProxyMiddleware': 410,
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None}}
class MySpider2(scrapy.Spider):
name = "my_spider2"
custom_settings = {"DOWNLOADER_MIDDLEWARES": {'myproject.middlewares.RandomUserAgentMiddleware': 400,
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None}}