What is Lossy Counting?

前端 未结 2 723
暖寄归人
暖寄归人 2021-02-01 07:26

Can anyone explain to me the Lossy Counting algorithm? It is a streaming algorithm on finding frequency of items in a stream. Thanks.

2条回答
  •  无人及你
    2021-02-01 07:56

    You can find an explanation of how Lossy Counting (and Sticky Sampling) work on this blog post and an open-source version here.

    The most frequently viewed items “survive”. Given a frequency threshold f, a frequency error e, and total number of elements N, the output can be expressed as follows: Elements with count exceeding fN – eN.

    Worst case we need (1/e) * log (eN) counters.

    For example, we may want to print the Facebook pages of people who get hit more than 20%, with an error threshold of 2% (rule of thumb: error = 10% of frequency threshold).

    For frequency f = 20%, e = 2%, all elements with true frequency exceeding f = 20% will be output – there are no false negatives. But we undercount. The output frequency of an element can be less than its true frequency by at most 2%. False positives could appear with frequency between 18% – 20%. Last, no element with frequency less than 18% will be output.

    Given window of size 1/e, the following guarantees hold:

    • Frequencies are underestimated by at most e*N
    • No false negatives
    • False positives have true frequency of at least fN – eN

提交回复
热议问题