Python LZMA : Compressed data ended before the end-of-stream marker was reached

眉间皱痕 提交于 2019-12-06 09:07:57
Giuseppe Pes

I spent a lot of time trying to understand and solve this problem, so i thought it would a good idea to share it. The problem seems to be caused by the a chunk of data without the EOF byte properly set. In order to decompress a buffer, I used to use the lzma.decompress provided by the lzma python lib. However, this method expects each data buffer to contains a EOF bytes, otherwise it throws a LZMAError exception.

To work around this limitation, we can implement an alternative decompress function which uses LZMADecompress object to extract the data from a buffer. For example:

def decompress_lzma(data):
    results = []
    while True:
        decomp = LZMADecompressor(FORMAT_AUTO, None, None)
        try:
            res = decomp.decompress(data)
        except LZMAError:
            if results:
                break  # Leftover data is not a valid LZMA/XZ stream; ignore it.
            else:
                raise  # Error on the first iteration; bail out.
        results.append(res)
        data = decomp.unused_data
        if not data:
            break
        if not decomp.eof:
            raise LZMAError("Compressed data ended before the end-of-stream marker was reached")
    return b"".join(results)

This function is similar to the one provided by the standard lzma lib with one key difference. The loop is broken if the entire buffer has been processed, before checking if we reached the EOF mark.

I hope this can be useful to other people.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!