I have a problem with scrapy. In a request fails (eg 404,500), how to ask for another alternative request? Such as two links can obtain price info, the one failed, request anoth
Use "errback" in the Request like
errback=self.error_handler
where error_handler is a function (just like callback function) in this function check the error code and make the alternative Request.
see errback in the scrapy documentation: http://doc.scrapy.org/en/latest/topics/request-response.html
Just set handle_httpstatus_list = [404, 500]
and check for the status code in parse
method. Here's an example:
from scrapy.http import Request
from scrapy.spider import BaseSpider
class MySpider(BaseSpider):
handle_httpstatus_list = [404, 500]
name = "my_crawler"
start_urls = ["http://github.com/illegal_username"]
def parse(self, response):
if response.status in self.handle_httpstatus_list:
return Request(url="https://github.com/kennethreitz/", callback=self.after_404)
def after_404(self, response):
print response.url
# parse the page and extract items
Also see:
Hope that helps.