Scrapy:In a request fails (eg 404,500), how to ask for another alternative request?

后端 未结 2 1374
借酒劲吻你
借酒劲吻你 2021-02-15 18:18

I have a problem with scrapy. In a request fails (eg 404,500), how to ask for another alternative request? Such as two links can obtain price info, the one failed, request anoth

相关标签:
2条回答
  • 2021-02-15 19:05

    Use "errback" in the Request like errback=self.error_handler where error_handler is a function (just like callback function) in this function check the error code and make the alternative Request.

    see errback in the scrapy documentation: http://doc.scrapy.org/en/latest/topics/request-response.html

    0 讨论(0)
  • 2021-02-15 19:12

    Just set handle_httpstatus_list = [404, 500] and check for the status code in parse method. Here's an example:

    from scrapy.http import Request
    from scrapy.spider import BaseSpider
    
    
    class MySpider(BaseSpider):
        handle_httpstatus_list = [404, 500]
        name = "my_crawler"
    
        start_urls = ["http://github.com/illegal_username"]
    
        def parse(self, response):
            if response.status in self.handle_httpstatus_list:
                return Request(url="https://github.com/kennethreitz/", callback=self.after_404)
    
        def after_404(self, response):
            print response.url
    
            # parse the page and extract items
    

    Also see:

    • How to get the scrapy failure URLs?
    • Scrapy and response status code: how to check against it?
    • How to retry for 404 link not found in scrapy?

    Hope that helps.

    0 讨论(0)
提交回复
热议问题