Most common element in an array / Finding the relative majority, deterministically in O(n) time and O(1) space?

后端 未结 4 1260
暗喜
暗喜 2021-01-17 17:18

So for example, the answer for the array:

1, 11, 3, 95, 23, 8, 1

would be 1, since all the other elements only occur once while 1 occurs twice.

A lo

4条回答
  •  鱼传尺愫
    2021-01-17 17:35

    This is not a complete answer, but it should help shed some light on why this problem is difficult.

    Consider we want to design an algorithm, that does one sweep over the array (in some order) to find the most common element. During the run of our algorithm, it is allowed to keep some data structure S. Let's see how much information there has to be in S, and thus if we can contain it in O(1) memory.

    Say our algorithm has processed the first k elements of the array. Now S can tell us the most common element in the range a[0..k]. However, say we knew the k+1'st element, then we would also know the most common element in the range a[0..k+1]. If it couldn't, our algorithm wouldn't work if n was k+1. More generally, given knowledge of elements a[k..m] and S, we know the most common element in a[0..m].

    We can use the above argument to extract information from S. Say we are working with integers in the range [0,u] (there has to be some range if the original array took space O(n)). If the original most common element is 5, then we add 0's until the most common element changes. If that took c zeroes, a[0..k] must have contained c more 5's than 0's. Repeating this argument we get a lot of linear equations which we can solve to tell exactly how many times each of the elements [0,u] were present in a[0..k].

    This tells us that any data structure that does a sweep, might as well store the counts of all the seen elements (in some compressed way). If you're interested in the maths, the stored after seeing n numbers is log(n+u-1 choose n) which is the log of the number of ways to partition n indistinguishable items into u distinguishable bins. That's more than log(u^n/n!) >= nlogu-nlogn.

    Conclusion: Any algorithm that does only one pass of the array will have to use as much memory as it takes to store all the counts seen so far. If n is small compared to u this corresponds to storing n words of memory.

    (Well, instead of extra memory we might also overwrite the existing array).

    There's a lot more to explore here. E.g. how multiple passes affect the above arguments. However I think I should stop at this point :), but it doesn't seem likely to me that any linear time algorithm, with a large u, will be able to get away with O(1) extra memory.

提交回复
热议问题