I have a simple website crawler, it works fine, but sometime it stuck because of large content such as ISO images, .exe files and other large stuff. Guessing content-type using
Use requests.head() for this. It will not return the message body. You should use head method if you are interested only in the headers. Check this link for detail.
h = requests.head(some_link)
header = h.headers
content_type = header.get('content-type')