Scrapy: non-blocking pause

前端 未结 3 1436
南方客
南方客 2021-01-31 10:31

I have a problem. I need to stop the execution of a function for a while, but not stop the implementation of parsing as a whole. That is, I need a non-blocking pause.

It

3条回答
  •  面向向阳花
    2021-01-31 11:07

    Request object has callback parameter, try to use that one for the purpose. I mean, create a Deferred which wraps self.second_parse_function and pause.

    Here is my dirty and not tested example, changed lines are marked.

    class ScrapySpider(Spider):
        name = 'live_function'
    
        def start_requests(self):
            yield Request('some url', callback=self.non_stop_function)
    
        def non_stop_function(self, response):
    
            parse_and_pause = Deferred()  # changed
            parse_and_pause.addCallback(self.second_parse_function) # changed
            parse_and_pause.addCallback(pause, seconds=10)  # changed
    
            for url in ['url1', 'url2', 'url3', 'more urls']:
                yield Request(url, callback=parse_and_pause)  # changed
    
            yield Request('some url', callback=self.non_stop_function)  # Call itself
    
        def second_parse_function(self, response):
            pass
    

    If the approach works for you then you can create a function which constructs a Deferred object according to the rule. It could be implemented in the way like the following:

    def get_perform_and_pause_deferred(seconds, fn, *args, **kwargs):
        d = Deferred()
        d.addCallback(fn, *args, **kwargs)
        d.addCallback(pause, seconds=seconds)
        return d
    

    And here is possible usage:

    class ScrapySpider(Spider):
        name = 'live_function'
    
        def start_requests(self):
            yield Request('some url', callback=self.non_stop_function)
    
        def non_stop_function(self, response):
            for url in ['url1', 'url2', 'url3', 'more urls']:
                # changed
                yield Request(url, callback=get_perform_and_pause_deferred(10, self.second_parse_function))
    
            yield Request('some url', callback=self.non_stop_function)  # Call itself
    
        def second_parse_function(self, response):
            pass
    

提交回复
热议问题