Requests - get content-type/size without fetching the whole page/content

后端 未结 4 1189
悲&欢浪女
悲&欢浪女 2021-02-07 12:35

I have a simple website crawler, it works fine, but sometime it stuck because of large content such as ISO images, .exe files and other large stuff. Guessing content-type using

4条回答
  •  不思量自难忘°
    2021-02-07 13:06

    Because requests.head() does NOT auto redirect, so a URL is redirected, requests.head() will get 0 for Content-Length. So make sure allow_redirects=True is added.

    r = requests.head(url, allow_redirects=True)
    length = r.headers['Content-Length']
    

    Refer to Requests Redirection And History

提交回复
热议问题