发表新帖

发表新帖

What is Lossy Counting?

前端未结

关注

 2  731

暖寄归人 2021-02-01 07:26

Can anyone explain to me the Lossy Counting algorithm? It is a streaming algorithm on finding frequency of items in a stream. Thanks.

2条回答

无人及你 (楼主)

2021-02-01 07:56
You can find an explanation of how Lossy Counting (and Sticky Sampling) work on this blog post and an open-source version here.

The most frequently viewed items “survive”. Given a frequency threshold f, a frequency error e, and total number of elements N, the output can be expressed as follows: Elements with count exceeding fN – eN.

Worst case we need (1/e) * log (eN) counters.

For example, we may want to print the Facebook pages of people who get hit more than 20%, with an error threshold of 2% (rule of thumb: error = 10% of frequency threshold).

For frequency f = 20%, e = 2%, all elements with true frequency exceeding f = 20% will be output – there are no false negatives. But we undercount. The output frequency of an element can be less than its true frequency by at most 2%. False positives could appear with frequency between 18% – 20%. Last, no element with frequency less than 18% will be output.

Given window of size 1/e, the following guarantees hold:
- Frequencies are underestimated by at most e*N
- No false negatives
- False positives have true frequency of at least fN – eN
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题