How to detect codeword length for LZW Decoding

问题

I'm writing a general LZW decoder c++ program and I'm having trouble finding documentation on the length (in bits) of codewords used. Some articles I've found say that codewords are 12bits long, while others say 16bits, while still others say that variable bit length is used. So which is it? It would make sense to me that bit length is variable since that would give the best compression (i.e. initially start with 9 bits, then move to 10 when necessary, then move to 11 etc...). But I can't find any "official" documentation on what the industry standard is.

For example, if I were to open up Microsoft Paint and create a simple 100x100pixel all black image and save it as a Tiff. The image is saved in the Tiff using LZW compression. So in this scenario when I'm parsing the LZW codewords, should I read in 9bits, 12bits, or 16bits for the first codeword? and how would I know which to use?

Thanks for any help you can provide.

回答1:

LZW can be done any of these ways. By far the most common (at least in my experience) is start with 9 bit codes, then when the dictionary gets full, move to 10 bit codes, and so on up to some maximum size.

From there, you typically have a couple of choices. One is to clear the dictionary and start over. Another is to continue using the current dictionary, without adding new entries. In the latter case, you typically track the compression rate, and if it drops too far, then you clear the dictionary and start over.

I'd have to dig through docs to be sure, but if I'm not mistaken, the specific implementation of LZW used in TIFF starts at 9 and goes up to 12 bits (when it was being designed, MS-DOS was a major target, and the dictionary for 12-bit codes used most of the available 640K of RAM). If memory serves, it clears the table as soon as the last 12-bit code has been used.

来源：https://stackoverflow.com/questions/35755758/how-to-detect-codeword-length-for-lzw-decoding

标签

c++

decode

lzw