I saw this question on Reddit, and there were no positive solutions presented, and I thought it would be a perfect question to ask here. This was in a thread about interview
So there is an algorithm that takes O(n^2) that does not require modifying the input array and takes constant space.
First, assume that you know n
and m
. This is a linear operation, so it does not add any additional complexity. Next, assume there exists one element equal to n
and one element equal to n+m-1
and all the rest are in [n, n+m)
. Given that, we can reduce the problem to having an array with elements in [0, m)
.
Now, since we know that the elements are bounded by the size of the array, we can treat each element as a node with a single link to another element; in other words, the array describes a directed graph. In this directed graph, if there are no duplicate elements, every node belongs to a cycle, that is, a node is reachable from itself in m
or less steps. If there is a duplicate element, then there exists one node that is not reachable from itself at all.
So, to detect this, you walk the entire array from start to finish and determine if each element returns to itself in <=m
steps. If any element is not reachable in <=m
steps, then you have a duplicate and can return false. Otherwise, when you finish visiting all elements, you can return true:
for (int start_index= 0; start_index<m; ++start_index)
{
int steps= 1;
int current_element_index= arr[start_index];
while (steps<m+1 && current_element_index!=start_index)
{
current_element_index= arr[current_element_index];
++steps;
}
if (steps>m)
{
return false;
}
}
return true;
You can optimize this by storing additional information:
sum_of_steps
.m-sum_of_steps
nodes out. If you don't return to the starting element and you don't visit an element before the starting element, you have found a loop containing duplicate elements and can return false
.This is still O(n^2), e.g. {1, 2, 3, 0, 5, 6, 7, 4}
, but it's a little bit faster.
If you want to know the sum of the numbers [n ... n + m - 1]
just use this equation.
var sum = m * (m + 2 * n - 1) / 2;
That works for any number, positive or negative, even if n is a decimal.
def test(a, n, m):
seen = [False] * m
for x in a:
if x < n or x >= n+m:
return False
if seen[x-n]:
return False
seen[x-n] = True
return False not in seen
print test([2, 3, 1], 1, 3)
print test([1, 3, 1], 1, 3)
print test([1, 2, 4], 1, 3)
Note that this only makes one pass through the first array, not considering the linear search involved in not in
. :)
I also could have used a python set
, but I opted for the straightforward solution where the performance characteristics of set
need not be considered.
Update: Smashery pointed out that I had misparsed "constant amount of memory" and this solution doesn't actually solve the problem.
I propose the following:
Choose a finite set of prime numbers P_1,P_2,...,P_K, and compute the occurrences of the elements in the input sequence (minus the minimum) modulo each P_i. The pattern of a valid sequence is known.
For example for a sequence of 17 elements, modulo 2 we must have the profile: [9 8], modulo 3: [6 6 5], modulo 5: [4 4 3 3 3], etc.
Combining the test using several bases we obtain a more and more precise probabilistic test. Since the entries are bounded by the integer size, there exists a finite base providing an exact test. This is similar to probabilistic pseudo primality tests.
S_i is an int array of size P_i, initially filled with 0, i=1..K
M is the length of the input sequence
Mn = INT_MAX
Mx = INT_MIN
for x in the input sequence:
for i in 1..K: S_i[x % P_i]++ // count occurrences mod Pi
Mn = min(Mn,x) // update min
Mx = max(Mx,x) // and max
if Mx-Mn != M-1: return False // Check bounds
for i in 1..K:
// Check profile mod P_i
Q = M / P_i
R = M % P_i
Check S_i[(Mn+j) % P_i] is Q+1 for j=0..R-1 and Q for j=R..P_i-1
if this test fails, return False
return True
Any one-pass algorithm requires Omega(n) bits of storage.
Suppose to the contrary that there exists a one-pass algorithm that uses o(n) bits. Because it makes only one pass, it must summarize the first n/2 values in o(n) space. Since there are C(n,n/2) = 2^Theta(n) possible sets of n/2 values drawn from S = {1,...,n}, there exist two distinct sets A and B of n/2 values such that the state of memory is the same after both. If A' = S \ A is the "correct" set of values to complement A, then the algorithm cannot possibly answer correctly for the inputs
A A' - yes
B A' - no
since it cannot distinguish the first case from the second.
Q.E.D.
Vote me down if I'm wrong, but I think we can determine if there are duplicates or not using variance. Because we know the mean beforehand (n + (m-1)/2 or something like that) we can just sum up the numbers and square of difference to mean to see if the sum matches the equation (mn + m(m-1)/2) and the variance is (0 + 1 + 4 + ... + (m-1)^2)/m. If the variance doesn't match, it's likely we have a duplicate.
EDIT: variance is supposed to be (0 + 1 + 4 + ... + [(m-1)/2]^2)*2/m, because half of the elements are less than the mean and the other half is greater than the mean.
If there is a duplicate, a term on the above equation will differ from the correct sequence, even if another duplicate completely cancels out the change in mean. So the function returns true only if both sum and variance matches the desrired values, which we can compute beforehand.