I have a simple website crawler, it works fine, but sometime it stuck because of large content such as ISO images, .exe files and other large stuff. Guessing content-type using
Because requests.head() does NOT auto redirect, so a URL is redirected, requests.head() will get 0 for Content-Length. So make sure allow_redirects=True is added.
r = requests.head(url, allow_redirects=True)
length = r.headers['Content-Length']