Scrapy retry or redirect middleware

前端 未结 2 1669
执念已碎
执念已碎 2021-02-03 14:27

While crawling through a site with scrapy, I get redirected to a user-blocked page about 1/5th of the time. I lose the pages that I get redirected from when that happe

2条回答
  •  [愿得一人]
    2021-02-03 14:56

    You can handle 302 responses by adding handle_httpstatus_list = [302] at the beginning of your spider like so:

    class MySpider(CrawlSpider):
        handle_httpstatus_list = [302]
    
        def parse(self, response):
            if response.status == 302:
                # Store response.url somewhere and go back to it later
    

提交回复
热议问题