I\'m trying to implement a Count-Min Sketch algorithm in Scala, and so I need to generate k pairwise independent hash functions.
This is a lower-level than anything I\'v
Scala already has MurmurHash
implemented (it's scala.util.MurmurHash
). It's very fast and very good at distributing values. A cryptographic hash is overkill--you'll just take tens or hundreds of times longer than you need to. Just pick k
different seeds to start with and, since it's nearly cryptographic in quality, you'll get k
largely independent hash codes. (In 2.10, you should probably switch to using scala.util.hashing.MurmurHash3
; the usage is rather different but you can still do the same thing with mixing.)
If you only need near values to be mapped to randomly far values this will work; if you want to avoid collisions (i.e. if A and B collide using hash 1 they will probably not also collide using hash 2), then you'll need to go at least one more step and hash not the whole object but subcomponents of it so there's an opportunity for the hashes to start out different.