I am using the built in lzma python to decode compressed chunk of data. Depending on the chunk of data, I get the following exception :
Compressed data ended before the end-of-stream marker was reached
The data is NOT corrupted. It can be decompressed correctly with other tools, so it must be a bug in the library. There are other people experiencing the same issue:
- http://bugs.python.org/issue21872
- https://github.com/peterjc/backports.lzma/issues/6
- Downloading large file in python error: Compressed file ended before the end-of-stream marker was reached
Unfortunately, none seems to have found a solution yet. At least, one that works on Python 3.5.
How can I solve this problem? Is there any work around?
I spent a lot of time trying to understand and solve this problem, so i thought it would a good idea to share it. The problem seems to be caused by the a chunk of data without the EOF byte properly set. In order to decompress a buffer, I used to use the lzma.decompress
provided by the lzma python lib. However, this method expects each data buffer to contains a EOF bytes, otherwise it throws a LZMAError
exception.
To work around this limitation, we can implement an alternative decompress function which uses LZMADecompress
object to extract the data from a buffer. For example:
def decompress_lzma(data):
results = []
while True:
decomp = LZMADecompressor(FORMAT_AUTO, None, None)
try:
res = decomp.decompress(data)
except LZMAError:
if results:
break # Leftover data is not a valid LZMA/XZ stream; ignore it.
else:
raise # Error on the first iteration; bail out.
results.append(res)
data = decomp.unused_data
if not data:
break
if not decomp.eof:
raise LZMAError("Compressed data ended before the end-of-stream marker was reached")
return b"".join(results)
This function is similar to the one provided by the standard lzma lib with one key difference. The loop is broken if the entire buffer has been processed, before checking if we reached the EOF mark.
I hope this can be useful to other people.
来源:https://stackoverflow.com/questions/37400583/python-lzma-compressed-data-ended-before-the-end-of-stream-marker-was-reached