Scrapy and response status code: how to check against it?

后端 未结 2 874
感情败类
感情败类 2020-12-08 15:38

I\'m using scrapy to crawl my sitemap, to check for 404, 302 and 200 pages. But i can\'t seem to be able to get the response code. This is my code so far:

fr         


        
相关标签:
2条回答
  • 2020-12-08 16:15

    Only to have a complete response here:

    • Set Handle_httpstatus_list = [302];

    • On request, set dont_redirect to True on meta.

    For example: Request(URL, meta={'dont_redirect': True});

    0 讨论(0)
  • 2020-12-08 16:39

    http://readthedocs.org/docs/scrapy/en/latest/topics/spider-middleware.html#module-scrapy.contrib.spidermiddleware.httperror

    Assuming default spider middleware is enabled, response codes outside of the 200-300 range are filtered out by HttpErrorMiddleware. You can tell the middleware you want to handle 404s by setting the handle_httpstatus_list attribute on your spider.

    class TothegoSitemapHomesSpider(SitemapSpider):
        handle_httpstatus_list = [404]
    
    0 讨论(0)
提交回复
热议问题