Scrapy and response status code: how to check against it?

后端 未结 2 873
感情败类
感情败类 2020-12-08 15:38

I\'m using scrapy to crawl my sitemap, to check for 404, 302 and 200 pages. But i can\'t seem to be able to get the response code. This is my code so far:

fr         


        
2条回答
  •  醉梦人生
    2020-12-08 16:39

    http://readthedocs.org/docs/scrapy/en/latest/topics/spider-middleware.html#module-scrapy.contrib.spidermiddleware.httperror

    Assuming default spider middleware is enabled, response codes outside of the 200-300 range are filtered out by HttpErrorMiddleware. You can tell the middleware you want to handle 404s by setting the handle_httpstatus_list attribute on your spider.

    class TothegoSitemapHomesSpider(SitemapSpider):
        handle_httpstatus_list = [404]
    

提交回复
热议问题