Generating k pairwise independent hash functions

前端 未结 2 1925
死守一世寂寞
死守一世寂寞 2021-02-14 20:29

I\'m trying to implement a Count-Min Sketch algorithm in Scala, and so I need to generate k pairwise independent hash functions.

This is a lower-level than anything I\'v

2条回答
  •  春和景丽
    2021-02-14 21:13

    Probably the simplest approach is to take some cryptographic hash function and "seed" it with different sequences of bytes. For most practical purposes, the results should be independent, as this is one of the key properties a cryptographic hash function should have (if you replace any part of a message, the hash should be completely different).

    I'd do something like:

    // for each 0 <= i < k generate a sequence of random numbers
    val randomSeeds: Array[Array[Byte]] = ... ; // initialize by random sequences
    
    def hash(i: Int, value: Array[Byte]): Array[Byte] = {
        val dg = java.security.MessageDigest.getInstance("SHA-1");
        // "seed" the digest by a random value based on the index
        dg.update(randomSeeds(i));
        return dg.digest(value);
        // if you need integer hash values, just take 4 bytes
        // of the result and convert them to an int
    }
    

    Edit: I don't know the precise requirements of the Count-Min Sketch, maybe a simple has function would suffice, but it doesn't seem to be the simplest solution.

    I suggested a cryptographic hash function, because there you have quite strong guarantees that the resulting hash functions will be very different, and it's easy to implement, just use the standard libraries.

    On the other hand, if you have two hash functions of the form f1(x) = ax + b (mod p) and f2(x) = cx + d (mod p), then you can compute one using another (without knowing x) using a simple linear formula f2(x) = c / a * (f1(x) - b) + d (mod p), which suggests that they aren't very independent. So you could run into unexpected problems here.

提交回复
热议问题