What is the difference between incremental and one-shot compression?

和自甴很熟 提交于 2019-12-01 08:00:41

问题


I am trying to use the bz2 and/or lzma packages in python. I am trying to compress a database dump in csv format and then put it to a zip file. I got it to work with one-shot compression with both the packages.

Code for which looks like this:

with ZipFile('something.zip', 'w') as zf:
    content = bz2.compress(bytes(csv_string, 'UTF-8'))  # also with lzma
    zf.writestr(
        'something.csv' + '.bz2',
        content,
        compress_type=ZIP_DEFLATED
    )

When I try to use incremental compression then it creates a .zip file which when I try to extract keeps giving some archive file recursively.

Code for which looks like this:

with ZipFile('something.zip', 'w') as zf:
    compressor = bz2.BZ2Compressor()
    content = compressor.compress(bytes(csv_string, 'UTF-8'))  # also with lzma
    zf.writestr(
        'something.csv' + '.bz2',
        content,
        compress_type=ZIP_DEFLATED
    )
    compressor.flush()

I went through the documentation and also look for information about the compression techniques, and there seems to be no comprehensive information about what one-shot and incremental compression are.


回答1:


The difference between one-shot and incremental is that with one-shot mode you need to have the entire data in memory; if you are compressing a 100 gigabyte file, you ought to have loads of RAM.

With the incremental encoder your code can feed the compressor 1 megabyte or 1 kilobyte at a time and write whatever data results, into a file as soon as it is available. Another benefit is that an incremental compressor you can use to stream data - you can start writing compressed data before all uncompressed data is available!


Your second code is incorrect and it will cause you to lose your data. The flush may return more data that you need to save as well. Here I am compressing a string of 1000 'a' characters in Python 3; the result from compress is an empty string; the actual compressed data is returned from flush.

>>> c = bz2.BZ2Compressor()
>>> c.compress(b'a' * 1000)
b''
>>> c.flush()
b'BZh91AY&SYI\xdcOc\x00\x00\x01\x81\x01\xa0\x00\x00\x80\x00\x08 \x00 
\xaamA\x98\xba\x83\xc5\xdc\x91N\x14$\x12w\x13\xd8\xc0'

Thus your second code should be:

compressor = bz2.BZ2Compressor()
content = compressor.compress(bytes(csv_string, 'UTF-8'))  # also with lzma
content += compressor.flush()    

But actually you're still doing the one-shot compression, in a very complicated manner.



来源:https://stackoverflow.com/questions/29258786/what-is-the-difference-between-incremental-and-one-shot-compression

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!