发表新帖

发表新帖

How to find the closest pairs (Hamming Distance) of a string of binary bins in Ruby without O^2 issues?

后端未结

关注

 4  2084

迷失自我 2021-02-06 01:13

I\'ve got a MongoDB with about 1 million documents in it. These documents all have a string that represents a 256 bit bin of 1s and 0s, like:

01101010101010101101010101

4条回答

余生分开走 (楼主)

2021-02-06 02:00

The Hamming distance defines a metric space, so you could use the O(n log n) algorithm to find the closest pair of points, which is of the typical divide-and-conquer nature.

You can then apply this repeatedly until you have "enough" pairs.

Edit: I see now that Wikipedia doesn't actually give the algorithm, so here is one description.

Edit 2: The algorithm can be modified to give up if there are no pairs at distance less than n. For the case of the Hamming distance: simply count the level of recursion you are in. If you haven't found something at level n in any branch, then give up (in other words, never enter n + 1). If you are using a metric where splitting on one dimension doesn't always yield a distance of 1, you need to adjust the level of recursion where you give up.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题