Does any mainstream compression algorithm natively support streaming data

问题

Does any mainstream compression algorithm, for example either snappy, zlib or bzip natively support streaming data across the network? For example if I have to send a compressed payload, then will I have to manually prepend the size of the payload before sending the message? Or does any library provide the API to tell whether a message is complete given x bytes?

回答1:

zlib, bzip2, lz4, zstd, brotli, lzma2, and many others all support streaming through the use of an end-of-data marker in the compressed data.

As it happens, one of the ones you mentioned, snappy, is not streamable in the sense you ask, since the format starts with an uncompressed size.

回答2:

Zstd does. There is a ZSTD_compressStream()/ZSTD_decompressStream() API.

See https://github.com/facebook/zstd/tree/dev/examples.

Pseudo-code below:

// Create stream       
ZSTD_CStream* const cstream = ZSTD_createCStream();

// Init stream
size_t const initResult = ZSTD_initCStream(cstream, cLevel);
size_t read, toRead;

while((read = fread(buffer, 1, toRead, file)) ) {
    ZSTD_inBuffer input = { buffIn, read, 0 };

    // Process next chunk
    while (input.pos < input.size) {
        ZSTD_outBuffer output = { buffOut, buffOutSize, 0 };

        // Compress Data
        toRead = ZSTD_compressStream(cstream, &output , &input);  
        [...]
        fwrite_orDie(buffOut, output.pos, fout);
   }
}

ZSTD_outBuffer output = { buffOut, buffOutSize, 0 };

// End stream
ZSTD_endStream(cstream, &output);  
[...]
// Free stream
ZSTD_freeCStream(cstream);

回答3:

There is also DEFLATE (zlib-compatible) stateless SLZ for streaming (compressing-only) to the many clients with reduced state memory per stream: http://www.libslz.org/ "Stateless ZIP library - SLZ":

SLZ is a fast and memory-less stream compressor which produces an output that can be decompressed with zlib or gzip. It does not implement decompression at all, zlib is perfectly fine for this. The purpose is to use SLZ in situations where a zlib-compatible stream is needed and zlib's resource usage would be too high while the compression ratio is not critical. The typical use case is in HTTP servers and gateways which have to compress many streams in parallel with little CPU resources to assign to this task, and without having to thottle the compression ratio due to the memory usage. In such an environment, the server's memory usage can easily be divided by 10 and the CPU usage by 3. In addition its high performance made it fill a gap in network backup applications.

While zlib uses 256 kB of memory per stream in addition to a few tens of bytes for the stream descriptor itself, SLZ only stores a stream descriptor made of 28 bytes. Thus it is particularly suited to environments having to deal with tens to hundreds of thousands of concurrent streams.

The key difference between zlib and SLZ is that SLZ is stateless in that it doesn't consider the data previously compressed as part of its dictionary. It doesn't hurt compression performance when it is fed in large enough chunks (at least a few kB at once)

来源：https://stackoverflow.com/questions/44478254/does-any-mainstream-compression-algorithm-natively-support-streaming-data

标签

sockets

compression

zlib

bzip2

snappy