Scrapy crawl http header data only

后端 未结 1 1484
故里飘歌
故里飘歌 2021-02-11 04:03

(How) can I archieve that scrapy only downloads the header data of a website (for check purposes etc.)

I\'ve tried to disable some download-middlewares but it doesn\'t s

1条回答
  •  孤城傲影
    2021-02-11 05:04

    Like @alexce said, you can issue HEAD Requests instead of the default GET:

    Request(url, method="HEAD")
    

    UPDATE: If you want to use HEAD requests for your start_urls you will need to override the make_requests_from_url method:

    def make_requests_from_url(self, url):
        return Request(url, method='HEAD', dont_filter=True)
    

    0 讨论(0)
提交回复
热议问题