redis HLL too many false positives

不羁岁月 提交于 2020-01-14 04:57:16

问题


Hyperlog log is a probablistic algorithm According to the redis HLL document , we could get 0.81% of error but I get errors like 17-20%

I think there is something wrong .. This is my simple perl test script. Is there some error

#!/usr/bin/perl -w                                                                                                                                                       
use Redis;
my $redis = Redis->new(server=>'192.168.50.166:6379') or die;
my $fp=0;
my $HLL="HLL";

$redis->del($HLL);
foreach my $i (1..10000) {
  my $s1 = $redis->pfadd($HLL,$i);
  if($s1 == 0){ 
    print "False positive on $i\n";
    $fp++;
  }
}
print "count of false positives $fp\n";

回答1:


HyperLogLog is used for counting unique items. It can count a large number of items with a little memory. However, the returned cardinality is NOT exact, but approximated with a standard error.

0.81% is the standard error, NOT the false positive. For your instance, you can call PFCOUNT HLL to get the approximated number of unique items you put into the HyperLogLog. The returned number should be in range of [10000 * (1 - 0.81%), 10000 * (1 + 0.81%)].

PFADD returns 1 if the estimated cardinality is changed after executing the command. It returns 0, otherwise. It has nothing to do with false positive.

It seems what you need is a Bloom Filter, which can tell you if an item already exists in a data set, with false positive. You can implement a Bloom Filter with Redis, of course. And there should be some open source project for that.



来源:https://stackoverflow.com/questions/42924393/redis-hll-too-many-false-positives

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!