问题
I'd like to stream a big log file over the network using asyncio. I retrieve the data from the database, format it, compress it using python's zlib and stream it over the network.
Here is basically the code I use:
@asyncio.coroutine
def logs(requests):
# ...
yield from resp.prepare(request)
# gzip magic number and compression format
resp.write(b'\x1f\x8b\x08\x00\x00\x00\x00\x00')
compressor = compressobj()
for row in rows:
ip, uid, date, url, answer, volume = row
NCSA_ROW = '{} {} - [{}] "GET {} HTTP/1.0" {} {}\n'
row = NCSA_ROW.format(ip, uid, date, url, answer, volume)
row = row.encode('utf-8')
data = compressor.compress(row)
resp.write(data)
resp.write(compressor.flush())
return resp
The file that I retrieve can not be opened with gunzip and zcat raise the following error:
gzip: out.gz: unexpected end of file
回答1:
Your gzip header is wrong (8 bytes instead of 10), and you follow it with a zlib stream which uses a different header and trailer. Even had you had a correct gzip header, and if you had a raw deflate stream instead of a gzip stream, you would still have not written a gzip trailer.
To do this right, do not attempt to write your own gzip header. Instead request that zlib write a complete gzip stream, which will write the correct header, compressed data, and trailer. You can do this by providing a wbits
value of 31
to compressobj()
.
来源:https://stackoverflow.com/questions/37944801/how-to-stream-a-gzip-built-on-the-fly-in-python