问题
Given a fixed number of bits (eg. slot) (m) and a fixed number of hash function (k), how one compute the theoretical false positive rate (p) ?
According to Wikipedia http://en.wikipedia.org/wiki/Bloom_filter, for a false positive rate (p) and a number of item (n), the number of bits (m) needed is given by m = - n * l(p) / (l(2)^2)
and the optimal number of hash function (k) is given by k = m / n * l(2)
.
From the formula given in Wikipedia page, I guess I could evaluate the theoretical false positive rate (p) by the following: p = (1 - e(-(k * n/m)))^k
But Wikipedia has another formula for (p) : p = e(-m/n*(l(2)^2))
which, I suppose, assume that (k) is the optimal number of hash function.
For my example, I took n = 1000000
and m = n * 2
, the optimal value for (k) would be 1.386, and the theoretical false positive rate (p) would be 0.382 according the previous formula.
Let's choose the number of function, compute the theoretical false positive rate (p) given a fixed (k) and compute the theoretical number of bits needed (m'):
for k = 1, p = .393 and m' = 1941401
for k = 2, p = .399 and m' = 1909344
for k = 3, p = .469 and m' = 1576527
for k = 4, p = .559 and m' = 1210636
The more bits are stuffed in the filter, the more false positive we get. Seems logical.
But could one confirm that formula p = (1 - e(-(k * n/m)))^k
is correct to get the theoretical false positive rate given a fixed (k),(m) and (n) ?
Note: the question seems already asked here: With fixed number of functions, how can I calculate the size of a Bloom Filter given the probability of false positives? but there's no answer that match my exact question. How many hash functions does my bloom filter need? might be of interest, but again it's not exactly the same.
Regards
回答1:
m – number of elements in bit array n – number of items in collection p – false positive probability // 0.0 – 1.0 ^ – power
p = e^(-(m/n) * (ln(2)^2));
I wrote a math friendly tutorial on Bloom Filters : http://techeffigy.wordpress.com/2014/06/05/bloom-filter-tutorial/
来源:https://stackoverflow.com/questions/15952524/bloom-filter-evaluating-false-positive-rate