How to add instance variable to Scrapy CrawlSpider?

落爺英雄遲暮 提交于 2019-12-25 07:24:47

问题


I am running a CrawlSpider and I want to implement some logic to stop following some of the links in mid-run, by passing a function to process_request.

This function uses the spider's class variables in order to keep track of the current state, and depending on it (and on the referrer URL), links get dropped or continue to be processed:

class BroadCrawlSpider(CrawlSpider):
    name = 'bitsy'
    start_urls = ['http://scrapy.org']
    foo = 5

    rules = (
        Rule(LinkExtractor(), callback='parse_item', process_request='filter_requests', follow=True),
    )

    def parse_item(self, response):
        <some code>

    def filter_requests(self, request):
        if self.foo == 6 and request.headers.get('Referer', None) == someval:
             raise IgnoreRequest("Ignored request: bla %s" % request)
        return request

I think that if I were to run several spiders on the same machine, they would all use the same class variables which is not my intention.

Is there a way to add instance variables to CrawlSpiders? Is only a single instance of the spider created when I run Scrapy?

I could probably work around it with a dictionary with values per process ID, but that will be ugly...


回答1:


I think spider arguments would be the solution in your case.

When invoking scrapy like scrapy crawl some_spider, you could add arguments like scrapy crawl some_spider -a foo=bar, and the spider would receive the values via its constructor, e.g.:

class SomeSpider(scrapy.Spider):
    def __init__(self, foo=None, *args, **kwargs):
        super(SomeSpider, self).__init__(*args, **kwargs)
        # Do something with foo

What's more, as scrapy.Spider actually sets all additional arguments as instance attributes, you don't even need to explicitly override the __init__ method but just access the .foo attribute. :)



来源:https://stackoverflow.com/questions/39186207/how-to-add-instance-variable-to-scrapy-crawlspider

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!