I\'m using scrapy to crawl my sitemap, to check for 404, 302 and 200 pages. But i can\'t seem to be able to get the response code. This is my code so far:
fr
Only to have a complete response here:
Set Handle_httpstatus_list = [302];
On request, set dont_redirect to True on meta.
For example: Request(URL, meta={'dont_redirect': True});
http://readthedocs.org/docs/scrapy/en/latest/topics/spider-middleware.html#module-scrapy.contrib.spidermiddleware.httperror
Assuming default spider middleware is enabled, response codes outside of the 200-300 range are filtered out by HttpErrorMiddleware. You can tell the middleware you want to handle 404s by setting the handle_httpstatus_list attribute on your spider.
class TothegoSitemapHomesSpider(SitemapSpider):
handle_httpstatus_list = [404]