Please identify this algorithm: probabilistic top-k elements in a data stream

前端 未结 3 864
你的背包
你的背包 2021-02-02 01:47

I remember hearing about the following algorithm some years back, but can\'t find any reference to it online. It identifies the top k elements (or heavy hitters) in a dat

3条回答
  •  被撕碎了的回忆
    2021-02-02 02:22

    You are talking about the notable Misra-Gries Algorithm, and Space-Saving Algorithm is a faster version of Misra-Gries Algorithm. Please check this lecture note for detail Streaming Algorithm Dartmouth sec 1.2.

    One thing I want to point out is that this algorithm does not give you the top-k elements if you only used k counters, instead, it gives all elements with frequency > m / k, where m is the total length of the data stream.

    Detailed analysis can be found in the lecture notes I attached.

提交回复
热议问题