Search in locality sensitive hashing

前端 未结 1 1129
梦谈多话
梦谈多话 2020-12-21 14:00

I\'m trying to understand the section 5. of this paper about LSH, in particular how to bucket the generated hashes. Quoting the linked paper:

Given bi

相关标签:
1条回答
  • 2020-12-21 14:12

    This question is somehow broad, so I am just going to give a minimal (abstract) example here:

    We have 6 (= n) vectors in our dataset, with d bits each. Let's assume that we do 2 (= N) random permutation.

    Let the 1st random permutation begin! Remember that we permute the bits, not the order of the vectors. After permuting the bits, they maintain an order, for example:

    v1
    v5
    v0
    v3
    v2
    v4
    

    Now the query vector, q, arrives, but it's (almost) unlikely that is going to be the same with a vector in our dataset (after the permutation), thus we won't find it by performing binary search.

    However, we are going to end up between two vectors. So now we can imagine the scenario to be like this (for example q lies between v0 and v3:

    v1
    v5
    v0 <-- up pointer
       <-- q lies here
    v3 <-- down pointer
    v2
    v4
    

    Now we move either up or down pointer, seeking for the vi vector that will match at the most bits with q. Let's say it was v0.

    Similarly, we do the second permutation and we find the vector vi, let's say v4. we now compare v0 from the first permutation and v4, to see which one is closest to q, i.e. which one has the most bits equal with q.


    Edit:

    Is it correct to say that the total cost of performing the N permutations is O(Nnlogn), since we have to sort each one of them?

    If they actually sort every permutation from scratch, then yes, but it's not clear for me how they do it.

    The permutation+sorting process described above is performed only once during the pre-processing or for every query q?

    ONCE.

    At the last point, where we compare v0 and v4 to q, we compare their permuted version or the original one (before their permutation)?

    I think they do it with the permuted version (see the parentheses before 2N in the paper). But that wouldn't make any difference, since they permute q too with the same permute (σ).


    This quora answer may shed some light too.

    0 讨论(0)
提交回复
热议问题