How to find the closest pairs (Hamming Distance) of a string of binary bins in Ruby without O^2 issues?

后端 未结 4 2081
迷失自我
迷失自我 2021-02-06 01:13

I\'ve got a MongoDB with about 1 million documents in it. These documents all have a string that represents a 256 bit bin of 1s and 0s, like:

01101010101010101101010101

4条回答
  •  南笙
    南笙 (楼主)
    2021-02-06 02:11

    This sounds like an algorithmic problem of some sort. You could try comparing those with a similar number of 1 or 0 bits first, then work down through the list from there. Those that are identical will, of course, come out on top. I don't think having tons of RAM will help here.

    You could also try and work with smaller chunks. Instead of dealing with 256 bit sequences, could you treat that as 32 8-bit sequences? 16 16-bit sequences? At that point you can compute differences in a lookup table and use that as a sort of index.

    Depending on how "different" you care to match on, you could just permute changes on the source binary value and do a keyed search to find the others that match.

提交回复
热议问题