问题
I have a byte stream that is a concatenation of sections, where each section is composed of a header plus a deflated byte stream.
I need to split this byte stream sections but the header only contains information about the data in uncompressed form, no hint about the compressed data length so I can advance properly in the stream and parse the next section.
So far the only way I found to advance past the deflated byte sequece is to parse it according to the this specification. From what I understood by reading the specification, a deflate stream is composed of blocks, which can be compressed blocks or literal blocks.
Literal blocks contain a size header which can be used to easily advance past it.
Compressed blocks are composed with 'prefix codes', which are bit sequences of variable length that have special meanings to the deflate algorithm. Since I'm only interested in finding out the deflated stream length, I guess the only code I need to look for is '0000000' which according to the specification signals the end of block.
So I came up with this coffeescript function to parse the deflate stream(I'm working on node.js)
# The job of this function is to return the position
# after the deflate stream contained in 'buffer'. The
# deflated stream begins at 'pos'.
advanceDeflateStream = (buffer, pos) ->
byteOffset = 0
finalBlock = false
while 1
if byteOffset == 6
firstTypeBit = 0b00000001 & buffer[pos]
pos++
secondTypeBit = 0b10000000 & buffer[pos]
type = firstTypeBit | (secondTypeBit << 1)
else
if byteOffset == 7
pos++
type = buffer[pos] & (0b01100000 >>> byteOffset)
if type == 0
# Literal block
# ignore the remaining bits and advance position
byteOffset = 0
pos++
len = buffer.readUInt16LE(pos)
pos += 2
lenComplement = buffer.readUInt16LE(pos)
if (len ^ ~lenComplement)
throw new Error('Literal block lengh check fail')
pos += (2 + len) # Advance past literal block
else if type in [1, 2]
# huffman block
# we are only interested in finding the 'block end' marker
# which is signaled by the bit string 0000000 (256)
eob = false
matchedZeros = 0
while !eob
byte = buffer[pos]
for i in [byteOffset..7]
# loop the remaining bits looking for 7 consecutive zeros
if (byte ^ (0b10000000 >>> byteOffset)) >>> (7 - byteOffset)
matchedZeros++
else
# reset counter
matchedZeros = 0
if matchedZeros == 7
eob = true
break
byteOffset++
if !eob
byteOffset = 0
pos++
else
throw new Error('Invalid deflate block')
finalBlock = buffer[pos] & (0b10000000 >>> byteOffset)
if finalBlock
break
return pos
To check if this works, I wrote a simple mocha test case:
zlib = require 'zlib'
test 'sample deflate stream', (done) ->
data = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' # length 30
zlib.deflate data, (err, deflated) ->
# deflated.length == 11
advanceDeflateStream(deflated, 0).shoudl.eql(11)
done()
The problem is that this test fails and I do not know how to debug it. I accept any answer that points what I missed in the parsing algorithm or contains a correct version of the above function in any language.
回答1:
The only way to find the end of a deflate stream or even a deflate block is to decode all of the Huffman codes contained within. There is no bit pattern that you can search for that can not appear earlier in the stream.
来源:https://stackoverflow.com/questions/14206518/how-to-advance-past-a-deflate-byte-sequence-contained-in-a-byte-stream