What are some alternatives to a bit array?

前端 未结 7 1681
夕颜
夕颜 2021-02-06 07:05

I have an information retrieval application that creates bit arrays on the order of 10s of million bits. The number of \"set\" bits in the array varies widely, from all clear to

7条回答
  •  死守一世寂寞
    2021-02-06 07:40

    Thanks for the answers. This is what I'm going to try for dynamically choosing the right method:

    I'll collect all of the first N hits in a conventional bit array, and choose one of three methods, based on the symmetry of this sample.

    • If the sample is highly asymmetric, I'll simply store the indexes to the set bits (or maybe the distance to the next bit) in a list.
    • If the sample is highly symmetric, I'll keep using a conventional bit array.
    • If the sample is moderately symmetric, I'll use a lossless compression method like Huffman coding suggested by InSciTekJeff.

    The boundaries between the asymmetric, moderate, and symmetric regions will depend on the time required by the various algorithms balanced against the space they need, where the relative value of time versus space would be an adjustable parameter. The space needed for Huffman coding is a function of the symmetry, and I'll profile that with testing. Also, I'll test all three methods to determine the time requirements of my implementation.

    It's possible (and actually I'm hoping) that the middle compression method will always be better than the list or the bit array or both. Maybe I can encourage this by choosing a set of Huffman codes adapted for higher or lower symmetry. Then I can simplify the system and just use two methods.

提交回复
热议问题