How to tell if an array is a permutation in O(n)?

粉色の甜心

Input: A read-only array of N elements containing integer values from 1 to N (some integer values can appear more than once!). And a memory zone of a fixed<

    This is impossible to do in O(1) space, at least with a single-scan algorithm.


    Suppose you have processed N/2 of the N elements. Assuming the sequence is a permutation then, given the state of the algorithm, you should be able to figure out the set of N/2 remaining elements. If you can't figure out the remaining elements, then the algorithm can be fooled by repeating some of the old elements.

    There are N choose N/2 possible remaining sets. Each of them must be represented by a distinct internal state of the algorithm, because otherwise you couldn't figure out the remaining elements. However, it takes logarithmic space to store X states, so it takes BigTheta(log(N choose N/2)) space to store N choose N/2 states. That values grows with N, and therefore the algorithm's internal state can not fit in O(1) space.

    More Formal Proof

    You want to create a program P which, given the final N/2 elements and the internal state of the linear-time-constant-space algorithm after it has processed N/2 elements, determines if the entire sequence is a permutation of 1..N. There is no time or space bound on this secondary program.

    Assuming P exists we can create a program Q, taking only the internal state of the linear-time-constant-space algorithm, which determines the necessary final N/2 elements of the sequence (if it was a permutation). Q works by passing P every possible final N/2 elements and returning the set for which P returns true.

    However, because Q has N choose N/2 possible outputs, it must have at least N choose N/2 possible inputs. That means the internal state of the original algorithm must store at least N choose N/2 states, requiring BigTheta(log N choose N/2), which is greater than constant size.

    Therefore the original algorithm, which does have time and space bounds, also can't work correctly if it has constant-size internal state.

    [I think this idea can be generalized, but thinking isn't proving.]


    BigTheta(log(N choose N/2)) is equal to BigTheta(N). Therefore just using a boolean array and ticking values as you encounter them is (probably) space-optimal, and time-optimal too since it takes linear time.

    it's a permutation if and only if there are no duplicate values in the array, should be easy to check that in O(N)

    First, an information theoretic reason why this may be possible. We can trivially check that the numbers in the array are in bounds in O(N) time and O(1) space. To specify any such array of in-bounds numbers requires N log N bits of information. But to specify a permutation requires approximately (N log N) - N bits of information (Stirling's approximation). Thus, if we could acquire N bits of information during testing, we might be able to know the answer. This is trivial to do in N time (in fact, with M static space we can pretty easily acquire log M information per step, and under special circumstances we can acquire log N information).

    On the other hand, we only get to store something like M log N bits of information in our static storage space, which is presumably much less than N, so it depends greatly what the shape of the decision surface is between "permutation" and "not".

    I think that this is almost possible but not quite given the problem setup. I think one is "supposed" to use the cycling trick (as in the link that Iulian mentioned), but the key assumption of having a tail in hand fails here because you can index the last element of the array with a permutation.

    This isn't going to work due to the complexity being given as a function of N rather than M, implying that N >> M

    This was my shot at it, but for a bloom filter to be useful, you need a big M, at which point you may as well use simple bit toggling for something like integers

    For each element in the array Run the k hash functions Check for inclusion in the bloom filter If it is there, there is a probability you've seen the element before If it isn't, add it

    When you are done, you may as well compare it to the results of a 1..N array in order, as that'll only cost you another N.

    Now if I haven't put enough caveats in. It isn't 100%, or even close since you specified complexity in N, which implies that N >> M, so fundamentally it won't work as you have specified it.

    BTW, the false positive rate for an individual item should be e = 2^(-m/(n*sqrt(2)))

    Which monkeying around with will give you an idea how big M would need to be to be acceptable.

