Is there a way to store gzip's dictionary from a file?

前端 未结 1 1172
猫巷女王i
猫巷女王i 2021-01-17 22:05

I\'ve been doing some research on compression-based text classification and I\'m trying to figure out a way of storing a dictionary built by the encoder (on a training file)

相关标签:
1条回答
  • 2021-01-17 22:36

    deflate encoders, as in gzip and zlib, do not "build" a dictionary. They simply use the previous 32K bytes as a source for potential matches to the string of bytes starting at the current position. The last 32K bytes is called the "dictionary", but the name is perhaps misleading.

    You can use zlib to experiment with preset dictionaries. See the deflateSetDictionary() and inflateSetDictionary() functions. In that case, zlib compression is primed with a "dictionary" of 32K bytes that effectively precede the first byte being compressed as a source for matches, but the dictionary itself is not compressed. The priming can only improve the compression of the first 32K bytes. After that, the preset dictionary is too far back to provide matches.

    gzip provides no support for preset dictionaries.

    0 讨论(0)
提交回复
热议问题