How to pack arbitrary bit sequence in Python?

淺唱寂寞╮ 提交于 2019-11-30 21:55:35
Scott Griffiths

There's nothing very convenient built in but there are third-party modules such as bitstring and bitarray which are designed for this.

from bitstring import BitArray
s = BitArray('0b11011')
s += '0b100'
s += 'uint:5=9'
s += [0, 1, 1, 0, 1]
...
s.tobytes()

To join together a sequence of 3-bit numbers (i.e. range 0->7) you could use

>>> symbols = [0, 4, 5, 3, 1, 1, 7, 6, 5, 2, 6, 2]
>>> BitArray().join(BitArray(uint=x, length=3) for x in symbols)
BitArray('0x12b27eab2')
>>> _.tobytes()
'\x12\xb2~\xab '

Some related questions:

have you tried simply compressing the whole sequence with bz2? If the sequence is long you should use the bz2.BZ2Compressor to allow chunked processing, otherwise use bz2.compress on the whole thing. The compression will probably not be ideal but will typically get very close when dealing with sparse data.

hope that helps.

Since you have a mapping from symbols to 3-bit string, bitarray does a nice job of encoding and decoding lists of symbols to and from arrays of bits:

from bitarray import bitarray
from random import choice

symbols = {
    '0' : bitarray('000'),
    'a' : bitarray('001'),
    'b' : bitarray('010'),
    'c' : bitarray('011'),
    'd' : bitarray('100'),
    'e' : bitarray('101'),
    'f' : bitarray('110'),
    'g' : bitarray('111'),
}

seedstring = ''.join(choice(symbols.keys()) for _ in range(40))

# construct bitarray using symbol->bitarray mapping
ba = bitarray()
ba.encode(symbols, seedstring)

print seedstring
print ba

# what does bitarray look like internally?
ba_string = ba.tostring()
print repr(ba_string)
print len(ba_string)

Prints:

egb0dbebccde0gfdfbc0d0ccfcg0acgg0ccfga00
bitarray('10111101000010001010101001101110010100... etc.
'\xbd\x08\xaanQ\xf4\xc9\x88\x1b\xcf\x82\xff\r\xee@'
15

You can see that this 40-symbol list (120 bits) gets encoded into a 15-byte bitarray.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!