Data structure to build and lookup set of integer ranges

前端 未结 7 1584
你的背包
你的背包 2020-12-19 09:58

I have a set of uint32 integers, there may be millions of items in the set. 50-70% of them are consecutive, but in input stream they appear in unpredictable ord

7条回答
  •  有刺的猬
    2020-12-19 10:36

    From the description of you problem it sounds like the following might be a good compromise. I've described it using an Object oriented language, but is easily convertible to C using a union type or structure with a type member and a pointer.

    Use the first 16 bits to index an array of objects (of size 65536). In that array there are 5 possible objects

    • a NONE object means no elements beginning with those 16bits are in the set
    • an ALL object means all elements beginning with 16 bits are in the set
    • a RANGE object means all items with the final 16bits between an upper and lower bound are in the set
    • a SINGLE object means just one element beginning with the 16bits is in the array
    • a BITSET object handles all remaining cases with a 65536 bit bitset

    Of course, you don't need to split at 16bits, you can adjust to reflect the statistics of your set. In fact you don't need to use consecutive bits, but it speeds up the bit twiddling, and if many of your elements are consecutive as you claim will give good properties.

    Hopefully this makes sense, please comment if I need to explain more fully. Effectively you've combined a depth 2 binary tree with a ranges and a bitset for a time/speed tradeoff. If you need to save memory then make the tree deeper with a corresponding slight increase in lookup time.

提交回复
热议问题