I\'m trying to calculate/validate the CRC32 checksums for compressed bzip2 archives.
.magic:16 = \'BZ\' signature/magic number
.version:8
To add onto the existing answer, there is a final checksum at the end of the stream (The one after eos_magic
) It functions as a checksum for all the individual Huffman block checksums. It is initialized to zero. It is updated every time you have finished validating an existing Huffman block checksum. To update it, do as follows:
crc: u32 = # latest validated Huffman block CRC
ccrc: u32 = # current combined checksum
ccrc = (ccrc << 1) | (ccrc >> 31);
ccrc ^= crc;
In the end, validate the value of ccrc
against the 32-bit unsigned value you read from the compressed file.
The following is the CRC algorithm used by bzip2
, written in Python:
crcVar = 0xffffffff # Init
for cha in list(dataIn):
crcVar = crcVar & 0xffffffff # Unsigned
crcVar = ((crcVar << 8) ^ (BZ2_crc32Table[(crcVar >> 24) ^ (ord(cha))]))
return hex(~crcVar & 0xffffffff)[2:-1].upper()
(C code definitions can be found on lines 155-172 in bzlib_private.h
)
BZ2_crc32Table
array/list can be found in crctable.c
from the bzip2
source code. This CRC checksum algorithm is, quoting: "..vaguely derived from code by Rob Warnock, in Section 51 of the comp.compression FAQ..." (crctable.c
)
The checksums are calculated over the uncompressed data.
Sources can be downloaded here: http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz