I would like to scrape just the title of a webpage using Python. I need to do this for thousands of sites so it has to be fast. I\'ve seen previous questions like retrieving jus
using urllib you can set the Range header to request a certain range of bytes, but there are some consequences:
(edit) - to avoid situations when title tag splits between two ranged requests, make your ranges overlapped, see 'range_header_overlapped' function in my example code
import urllib
req = urllib.request.Request('http://www.python.org/')
req.headers['Range']='bytes=%s-%s' % (0, 300)
f = urllib.request.urlopen(req)
content_range=f.headers.get('Content-Range')
print(content_range)