Algorithm to determine if array contains n…n+m?

前端未结

关注

 30  2971

I saw this question on Reddit, and there were no positive solutions presented, and I thought it would be a perfect question to ask here. This was in a thread about interview

相关标签:

30条回答

北恋

2020-11-28 02:11
So there is an algorithm that takes O(n^2) that does not require modifying the input array and takes constant space.

First, assume that you know n and m. This is a linear operation, so it does not add any additional complexity. Next, assume there exists one element equal to n and one element equal to n+m-1 and all the rest are in [n, n+m). Given that, we can reduce the problem to having an array with elements in [0, m).

Now, since we know that the elements are bounded by the size of the array, we can treat each element as a node with a single link to another element; in other words, the array describes a directed graph. In this directed graph, if there are no duplicate elements, every node belongs to a cycle, that is, a node is reachable from itself in m or less steps. If there is a duplicate element, then there exists one node that is not reachable from itself at all.

So, to detect this, you walk the entire array from start to finish and determine if each element returns to itself in <=m steps. If any element is not reachable in <=m steps, then you have a duplicate and can return false. Otherwise, when you finish visiting all elements, you can return true:
```
for (int start_index= 0; start_index<m; ++start_index)
{
    int steps= 1;
    int current_element_index= arr[start_index];
    while (steps<m+1 && current_element_index!=start_index)
    {
        current_element_index= arr[current_element_index];
        ++steps;
    }

    if (steps>m)
    {
        return false;
    }
}

return true;
```
You can optimize this by storing additional information:
1. Record sum of the length of the cycle from each element, unless the cycle visits an element before that element, call it sum_of_steps.
2. For every element, only step m-sum_of_steps nodes out. If you don't return to the starting element and you don't visit an element before the starting element, you have found a loop containing duplicate elements and can return false.
This is still O(n^2), e.g. {1, 2, 3, 0, 5, 6, 7, 4}, but it's a little bit faster.
0 讨论(0)
发布评论:

提交评论
- 加载中...
感动是毒

2020-11-28 02:12
If you want to know the sum of the numbers [n ... n + m - 1] just use this equation.
```
var sum = m * (m + 2 * n - 1) / 2;
```
That works for any number, positive or negative, even if n is a decimal.
0 讨论(0)
发布评论:

提交评论
- 加载中...
情歌与酒

2020-11-28 02:14
```
def test(a, n, m):
    seen = [False] * m
    for x in a:
        if x < n or x >= n+m:
            return False
        if seen[x-n]:
            return False
        seen[x-n] = True
    return False not in seen

print test([2, 3, 1], 1, 3)
print test([1, 3, 1], 1, 3)
print test([1, 2, 4], 1, 3)
```
Note that this only makes one pass through the first array, not considering the linear search involved in not in. :)

I also could have used a python set, but I opted for the straightforward solution where the performance characteristics of set need not be considered.

Update: Smashery pointed out that I had misparsed "constant amount of memory" and this solution doesn't actually solve the problem.
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘了有多久

2020-11-28 02:14
I propose the following:

Choose a finite set of prime numbers P_1,P_2,...,P_K, and compute the occurrences of the elements in the input sequence (minus the minimum) modulo each P_i. The pattern of a valid sequence is known.

For example for a sequence of 17 elements, modulo 2 we must have the profile: [9 8], modulo 3: [6 6 5], modulo 5: [4 4 3 3 3], etc.

Combining the test using several bases we obtain a more and more precise probabilistic test. Since the entries are bounded by the integer size, there exists a finite base providing an exact test. This is similar to probabilistic pseudo primality tests.
```
S_i is an int array of size P_i, initially filled with 0, i=1..K
M is the length of the input sequence
Mn = INT_MAX
Mx = INT_MIN

for x in the input sequence:
  for i in 1..K: S_i[x % P_i]++  // count occurrences mod Pi
  Mn = min(Mn,x)  // update min
  Mx = max(Mx,x)  // and max

if Mx-Mn != M-1: return False  // Check bounds

for i in 1..K:
  // Check profile mod P_i
  Q = M / P_i
  R = M % P_i
  Check S_i[(Mn+j) % P_i] is Q+1 for j=0..R-1 and Q for j=R..P_i-1
  if this test fails, return False

return True
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
说谎

2020-11-28 02:16

Any one-pass algorithm requires Omega(n) bits of storage.

Suppose to the contrary that there exists a one-pass algorithm that uses o(n) bits. Because it makes only one pass, it must summarize the first n/2 values in o(n) space. Since there are C(n,n/2) = 2^Theta(n) possible sets of n/2 values drawn from S = {1,...,n}, there exist two distinct sets A and B of n/2 values such that the state of memory is the same after both. If A' = S \ A is the "correct" set of values to complement A, then the algorithm cannot possibly answer correctly for the inputs

A A' - yes

B A' - no

since it cannot distinguish the first case from the second.

Q.E.D.

0 讨论(0)
发布评论:

提交评论
- 加载中...
無奈伤痛

2020-11-28 02:17

Vote me down if I'm wrong, but I think we can determine if there are duplicates or not using variance. Because we know the mean beforehand (n + (m-1)/2 or something like that) we can just sum up the numbers and square of difference to mean to see if the sum matches the equation (mn + m(m-1)/2) and the variance is (0 + 1 + 4 + ... + (m-1)^2)/m. If the variance doesn't match, it's likely we have a duplicate.

EDIT: variance is supposed to be (0 + 1 + 4 + ... + [(m-1)/2]^2)*2/m, because half of the elements are less than the mean and the other half is greater than the mean.

If there is a duplicate, a term on the above equation will differ from the correct sequence, even if another duplicate completely cancels out the change in mean. So the function returns true only if both sum and variance matches the desrired values, which we can compute beforehand.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3 4 5