Scrapy: non-blocking pause

前端 未结 3 1435
南方客
南方客 2021-01-31 10:31

I have a problem. I need to stop the execution of a function for a while, but not stop the implementation of parsing as a whole. That is, I need a non-blocking pause.

It

3条回答
  •  无人共我
    2021-01-31 11:09

    The asker already provides an answer in the question's update, but I want to give a slightly better version so it's reusable for any request.

    # removed...
    from twisted.internet import reactor, defer
    
    class MySpider(scrapy.Spider):
        # removed...
    
        def request_with_pause(self, response):
            d = defer.Deferred()
            reactor.callLater(response.meta['time'], d.callback, scrapy.Request(
                response.url,
                callback=response.meta['callback'],
                dont_filter=True, meta={'dont_proxy':response.meta['dont_proxy']}))
            return d
    
        def parse(self, response):
            # removed....
            yield scrapy.Request(the_url, meta={
                                'time': 86400, 
                                'callback': self.the_parse, 
                                'dont_proxy': True
                                }, callback=self.request_with_pause)
    

    For explanation, Scrapy use Twisted to manage the request asynchronously, so we need Twisted's tool to do a delayed request too.

提交回复
热议问题