Algorithm to determine if array contains n…n+m?

前端 未结 30 2971
清酒与你
清酒与你 2020-11-28 01:45

I saw this question on Reddit, and there were no positive solutions presented, and I thought it would be a perfect question to ask here. This was in a thread about interview

相关标签:
30条回答
  • 2020-11-28 02:11

    So there is an algorithm that takes O(n^2) that does not require modifying the input array and takes constant space.

    First, assume that you know n and m. This is a linear operation, so it does not add any additional complexity. Next, assume there exists one element equal to n and one element equal to n+m-1 and all the rest are in [n, n+m). Given that, we can reduce the problem to having an array with elements in [0, m).

    Now, since we know that the elements are bounded by the size of the array, we can treat each element as a node with a single link to another element; in other words, the array describes a directed graph. In this directed graph, if there are no duplicate elements, every node belongs to a cycle, that is, a node is reachable from itself in m or less steps. If there is a duplicate element, then there exists one node that is not reachable from itself at all.

    So, to detect this, you walk the entire array from start to finish and determine if each element returns to itself in <=m steps. If any element is not reachable in <=m steps, then you have a duplicate and can return false. Otherwise, when you finish visiting all elements, you can return true:

    for (int start_index= 0; start_index<m; ++start_index)
    {
        int steps= 1;
        int current_element_index= arr[start_index];
        while (steps<m+1 && current_element_index!=start_index)
        {
            current_element_index= arr[current_element_index];
            ++steps;
        }
    
        if (steps>m)
        {
            return false;
        }
    }
    
    return true;
    

    You can optimize this by storing additional information:

    1. Record sum of the length of the cycle from each element, unless the cycle visits an element before that element, call it sum_of_steps.
    2. For every element, only step m-sum_of_steps nodes out. If you don't return to the starting element and you don't visit an element before the starting element, you have found a loop containing duplicate elements and can return false.

    This is still O(n^2), e.g. {1, 2, 3, 0, 5, 6, 7, 4}, but it's a little bit faster.

    0 讨论(0)
  • 2020-11-28 02:12

    If you want to know the sum of the numbers [n ... n + m - 1] just use this equation.

    var sum = m * (m + 2 * n - 1) / 2;
    

    That works for any number, positive or negative, even if n is a decimal.

    0 讨论(0)
  • 2020-11-28 02:14
    def test(a, n, m):
        seen = [False] * m
        for x in a:
            if x < n or x >= n+m:
                return False
            if seen[x-n]:
                return False
            seen[x-n] = True
        return False not in seen
    
    print test([2, 3, 1], 1, 3)
    print test([1, 3, 1], 1, 3)
    print test([1, 2, 4], 1, 3)
    

    Note that this only makes one pass through the first array, not considering the linear search involved in not in. :)

    I also could have used a python set, but I opted for the straightforward solution where the performance characteristics of set need not be considered.

    Update: Smashery pointed out that I had misparsed "constant amount of memory" and this solution doesn't actually solve the problem.

    0 讨论(0)
  • 2020-11-28 02:14

    I propose the following:

    Choose a finite set of prime numbers P_1,P_2,...,P_K, and compute the occurrences of the elements in the input sequence (minus the minimum) modulo each P_i. The pattern of a valid sequence is known.

    For example for a sequence of 17 elements, modulo 2 we must have the profile: [9 8], modulo 3: [6 6 5], modulo 5: [4 4 3 3 3], etc.

    Combining the test using several bases we obtain a more and more precise probabilistic test. Since the entries are bounded by the integer size, there exists a finite base providing an exact test. This is similar to probabilistic pseudo primality tests.

    S_i is an int array of size P_i, initially filled with 0, i=1..K
    M is the length of the input sequence
    Mn = INT_MAX
    Mx = INT_MIN
    
    for x in the input sequence:
      for i in 1..K: S_i[x % P_i]++  // count occurrences mod Pi
      Mn = min(Mn,x)  // update min
      Mx = max(Mx,x)  // and max
    
    if Mx-Mn != M-1: return False  // Check bounds
    
    for i in 1..K:
      // Check profile mod P_i
      Q = M / P_i
      R = M % P_i
      Check S_i[(Mn+j) % P_i] is Q+1 for j=0..R-1 and Q for j=R..P_i-1
      if this test fails, return False
    
    return True
    
    0 讨论(0)
  • 2020-11-28 02:16

    Any one-pass algorithm requires Omega(n) bits of storage.

    Suppose to the contrary that there exists a one-pass algorithm that uses o(n) bits. Because it makes only one pass, it must summarize the first n/2 values in o(n) space. Since there are C(n,n/2) = 2^Theta(n) possible sets of n/2 values drawn from S = {1,...,n}, there exist two distinct sets A and B of n/2 values such that the state of memory is the same after both. If A' = S \ A is the "correct" set of values to complement A, then the algorithm cannot possibly answer correctly for the inputs

    A A' - yes

    B A' - no

    since it cannot distinguish the first case from the second.

    Q.E.D.

    0 讨论(0)
  • 2020-11-28 02:17

    Vote me down if I'm wrong, but I think we can determine if there are duplicates or not using variance. Because we know the mean beforehand (n + (m-1)/2 or something like that) we can just sum up the numbers and square of difference to mean to see if the sum matches the equation (mn + m(m-1)/2) and the variance is (0 + 1 + 4 + ... + (m-1)^2)/m. If the variance doesn't match, it's likely we have a duplicate.

    EDIT: variance is supposed to be (0 + 1 + 4 + ... + [(m-1)/2]^2)*2/m, because half of the elements are less than the mean and the other half is greater than the mean.

    If there is a duplicate, a term on the above equation will differ from the correct sequence, even if another duplicate completely cancels out the change in mean. So the function returns true only if both sum and variance matches the desrired values, which we can compute beforehand.

    0 讨论(0)
提交回复
热议问题