I\'m trying to download a large file from a server with Python 2:
req = urllib2.Request(\"https://myserver/mylargefile.gz\")
rsp = urllib2.urlopen(req)
data
I have the same problem.
I found that "Transfer-Encoding: chunked" often appears with "Content-Encoding: gzip".
So maybe we can get the compressed content and unzip it.
It works for me.
import urllib2
from StringIO import StringIO
import gzip
req = urllib2.Request(url)
req.add_header('Accept-encoding', 'gzip, deflate')
rsp = urllib2.urlopen(req)
if rsp.info().get('Content-Encoding') == 'gzip':
buf = StringIO(rsp.read())
f = gzip.GzipFile(fileobj=buf)
data = f.read()
From the python documentation on urllib2.urlopen:
One caveat: the read() method, if the size argument is omitted or negative, may not read until the end of the data stream; there is no good way to determine that the entire stream from a socket has been read in the general case.
So, read the data in a loop:
req = urllib2.Request("https://myserver/mylargefile.gz")
rsp = urllib2.urlopen(req)
data = rsp.read(8192)
while data:
# .. Do Something ..
data = rsp.read(8192)
If I'm not mistaken, the following worked for me - a while back:
data = ''
chunk = rsp.read()
while chunk:
data += chunk
chunk = rsp.read()
Each read
reads one chunk - so keep on reading until nothing more's coming.
Don't have documenation ready supporting this...yet.