How to bucket locality-sensitive hashes?

会有一股神秘感。 提交于 2019-12-04 08:34:10
gsamaras

Based on: Search in locality sensitive hashing I would say this, after reading Similarity Estimation Techniques from Rounding Algorithms:

This question is somehow broad, so I am just going to give a minimal (abstract) example here:

We have 6 (= n) vectors in our dataset, with d bits each. Let's assume that we do 2 (= N) random permutation.

Let the 1st random permutation begin! Remember that we permute the bits, not the order of the vectors. After permuting the bits, they maintain an order, for example:

v1
v5
v0
v3
v2
v4

Now the query vector, q, arrives, but it's (almost) unlikely that is going to be the same with a vector in our dataset (after the permutation), thus we won't find it by performing binary search.

However, we are going to end up between two vectors. So now we can imagine the scenario to be like this (for example q lies between v0 and v3:

v1
v5
v0 <-- up pointer
   <-- q lies here
v3 <-- down pointer
v2
v4

Now we move either up or down pointer, seeking for the vi vector that will match at the most bits with q. Let's say it was v0.

Similarly, we do the second permutation and we find the vector vi, let's say v4. we now compare v0 from the first permutation and v4, to see which one is closest to q, i.e. which one has the most bits equal with q.


However, if you are seeking for a ready implementation, you should ask in Software Recommendation. I would also look at the paper I linked to to see if the author(s) made the code public, or if they would like to share it after contacting them.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!