A bloom filter uses a hash function (or many) to generate a value between 0 and m given an input string X. My question is how to you use a hash function to generate a value
You should first convert the hash output to an unsigned integer, then reduce it modulo m. This looks like this:
MessageDigest md = MessageDigest.getInstance("MD5");
// hash data...
byte[] hashValue = md.digest();
BigInteger n = new BigInteger(1, hashValue);
n = n.mod(m);
// at that point, n has a value between 0 and m-1 (inclusive)
I have assumed that m is a BigInteger
instance. If necessary, use BigInteger.valueOf()
. Similarly, use n.intValue()
or n.longValue()
to get the value of n as one of the primitive types of Java.
The modular reduction is somewhat biased, but the bias is very small if m is substantially smaller than 2^128.