LZ4 library decompressed data upper bound size estimation

牧云@^-^@ 提交于 2019-11-30 03:41:23

问题


I'm using LZ4 library and when decompressing data with

int LZ4_decompress_safe (const char* source, char* dest, int compressedSize, int maxDecompressedSize);

I want to estimate maximum decompressed data size. But I can not find reverse function of

int LZ4_compressBound(int isize);

with which I can determine the upper bound for decompressed data, which to provide to last parameter maxDecompressedSize, of decompressing function.

Other compression libraries like snappy for example, provides such function.

bool GetUncompressedLength(Source* source, uint32* result);

What can I do if I have not capability to save initial data size (before compression), and if I don't want to be over pessimistic for the size of the buffer which I must allocate?


回答1:


The maximum compression ratio of LZ4 is 255, so a guaranteed over-estimation of decompressed data size is 255 times input size.

That's obviously too much to be really useful, hence the reason why there is no "reverse LZ4_compressBound()" function available.

I'm afraid there is no other way than to save, or know, the uncompressed size. The LZ4 "raw" compression format doesn't define a way to save such information, because optimal choice is application specific. For example, some application know in advance that no block can be > 16KB, so they can use maxDecompressedSize = 16 KB when calling LZ4_decompress_safe().

Now, if you are looking for an envelope format that will take in charge such responsibility, you could either create your own custom one, or use the LZ4 Framing format : http://fastcompression.blogspot.fr/2013/04/lz4-streaming-format-final.html (also present as LZ4_Framing_Format.html within source package). Alas, the library able to generate and read this format is currently in beta stage (https://github.com/Cyan4973/lz4/tree/frame)




回答2:


Just for reference, n bytes of LZ4 compressed data can represent up to 24 + 255(n - 10) uncompressed bytes, which is the case of a run of that many bytes. n must be at least ten to make a valid stream that includes a literal, a match, and then five literals at the end per the specification. So the decompress bound function could be something like (n << 8) - n - 2526.

The maximum compression ratio is then: 255 - 2526 / n, which asymptotically approaches 255 for arbitrarily large n.



来源:https://stackoverflow.com/questions/25740471/lz4-library-decompressed-data-upper-bound-size-estimation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!