How to set different scrapy-settings for different spiders?

前端 未结 5 1651
逝去的感伤
逝去的感伤 2020-12-05 14:48

I want to enable some http-proxy for some spiders, and disable them for other spiders.

Can I do something like this?

# settings.py
proxy_spiders = [         


        
相关标签:
5条回答
  • 2020-12-05 15:03

    You can add setting.overrides within the spider.py file Example that works:

    from scrapy.conf import settings
    
    settings.overrides['DOWNLOAD_TIMEOUT'] = 300 
    

    For you, something like this should also work

    from scrapy.conf import settings
    
    settings.overrides['DOWNLOADER_MIDDLEWARES'] = {
         'myproject.middlewares.RandomUserAgentMiddleware': 400,
         'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
    }
    
    0 讨论(0)
  • 2020-12-05 15:05

    There is a new and easier way to do this.

    class MySpider(scrapy.Spider):
        name = 'myspider'
    
        custom_settings = {
            'SOME_SETTING': 'some value',
        }
    

    I use Scrapy 1.3.1

    0 讨论(0)
  • 2020-12-05 15:07

    Why not use two projects rather than only one?

    Let's name these two projects with proj1 and proj2. In proj1's settings.py, put these settings:

    HTTP_PROXY = 'http://127.0.0.1:8123'
    DOWNLOADER_MIDDLEWARES = {
         'myproject.middlewares.RandomUserAgentMiddleware': 400,
         'myproject.middlewares.ProxyMiddleware': 410,
         'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
    }
    

    In proj2's settings.py, put these settings:

    DOWNLOADER_MIDDLEWARES = {
         'myproject.middlewares.RandomUserAgentMiddleware': 400,
         'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
    }
    
    0 讨论(0)
  • 2020-12-05 15:19

    You can define your own proxy middleware, something straightforward like this:

    from scrapy.contrib.downloadermiddleware import HttpProxyMiddleware
    
    class ConditionalProxyMiddleware(HttpProxyMiddleware):
        def process_request(self, request, spider):
            if getattr(spider, 'use_proxy', None):
                return super(ConditionalProxyMiddleware, self).process_request(request, spider)
    

    Then define the attribute use_proxy = True in the spiders that you want to have the proxy enabled. Don't forget to disable the default proxy middleware and enable your modified one.

    0 讨论(0)
  • 2020-12-05 15:20

    a bit late, but since release 1.0.0 there is a new feature in scrapy where you can override settings per spider like this:

    class MySpider(scrapy.Spider):
        name = "my_spider"
        custom_settings = {"HTTP_PROXY":'http://127.0.0.1:8123',
                           "DOWNLOADER_MIDDLEWARES": {'myproject.middlewares.RandomUserAgentMiddleware': 400,
                                                      'myproject.middlewares.ProxyMiddleware': 410,
                                                      'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None}}
    
    
    
    
    class MySpider2(scrapy.Spider):
            name = "my_spider2"
            custom_settings = {"DOWNLOADER_MIDDLEWARES": {'myproject.middlewares.RandomUserAgentMiddleware': 400,
                                                          'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None}}
    
    0 讨论(0)
提交回复
热议问题