Find duplicate in an array

后端 未结 5 1796
醉梦人生
醉梦人生 2021-01-14 16:45

Given a read only array of n + 1 integers between 1 and n, find one number that repeats in linear time using less than O(n) space and traversing the stream sequentially O(1)

相关标签:
5条回答
  • 2021-01-14 17:15

    You are correct in wondering why this would be accepted. This answer is obvious O(n) space complexity. You allocating some amount of data that grows directly proportionally with n, making it O(n) space. Whatever is judging your program is incorrectly accepting it. It may be possible that the judge is accepting your score because you are using less bytes than are allocated by A, but that is only speculation.

    EDIT: The code bellow isn't actually a solution to the problem. It is a solution to a simpler problem along the lines of the above. The solution below ignores the constraint that the stream must be read only. After doing some research, it appears that this problem is a very difficult version of a series of similar problems of the type "Given a range of numbers between 1 and n, find the repeating/missing number". If there were only one number repeated, and there was only a O(n) time requirement, you could use a bool vector as above. If there were only one number repeated, but you were constrained to constant space, you could implement this solution where we use gauss's formula to find the sum of integers from 1 to n, and subtract that from the sum of the array. If the array had two missing numbers, and you were constrained to constant time, you could implement this solution where we use the sum and product of the array to create a system of equations which can be solved in O(n) time with O(1) space.

    To solve the question posed above, it looks like one would have to implement something to the order of this monstrosity.

    Here is a solution this problem within its constraints:

    You could do something like this:

    #include<vector>
    #include<iostream>
    int repeating(std::vector<int>& arr)
    {
      for (int i = 0; i < arr.size(); i++)
      {
        if (arr[abs(arr[i])] >= 0)
          arr[abs(arr[i])] = -arr[abs(arr[i])];
        else {
          return abs(arr[i]);
        }
      }
    }
    int main()
    {
            std::vector<int> v{1,2,3,4,5,1};
    
            std::cout<<repeating(v)<<std::endl;
            std::cout<<sizeof(v)*sizeof(v[0])<<std::endl;
            return 0;
    }
    

    The above program uses the input array itself to track duplicates. For each index i, the array evaluates arr[i]. The array sets arr(arr[i]) negative. Negating a value is an easily reversible operation (simply take the absolute value of the element), so it can be used to mark an index of the array without ruining the integrity of the data. If you ever encounter an index such that arr[abs(arr[i])] is negative, you know that you have seen abs(arr[i])) before in the array. This uses O(1) space complexity, traverses the array once, and can be modified to return any or all duplicate numbers.

    0 讨论(0)
  • 2021-01-14 17:17

    I have a solution which requires O(sqrt(N)) space and O(N) time, and traverses the list twice -- assuming it is possible to calculate the integer square root in O(1) time (for arbitrary large N, this is likely at least an O(log(N)) operation).

    • First allocate an integer array A1 of size ceil(sqrt(N)), filled with 0.
    • Iterate through your array, for each element x
      • compute k=floor(sqrt(x))
      • increment A1[k]
      • If A1[k]>2k+1, there must be at least one duplicate between and (k+1)²-1. (For k=floor(sqrt(N)) the threshold is N-k²). Rememberk` and break first iteration
    • optionally delete first array
    • Allocate a boolean array A2 of size 2k+1 filled with false.
    • Iterate through all x again:
      • Check if A2[x-k²] is set, if yes, x is a duplicate
      • Otherwise, increment A2[x-k²]

    The solution should also work for larger and smaller arrays (does not need to be exactly N+1), and if there are no duplicates, the first iteration will run to the end. Both temporary arrays are O(k) (if you are pedantic, the first one is O(k*log(k)), since it must store integers up to size sqrt(N)).

    0 讨论(0)
  • 2021-01-14 17:25

    std::vector<bool> isn't like any other vector.

    std::vector<bool> is a possibly space-efficient specialization of std::vector for the type bool.

    That's why it may use up less memory because it might represent multiple boolean values with one byte, like a bitset.

    0 讨论(0)
  • 2021-01-14 17:27

    Well it's constant (O(1)) in memory because you're simply doing a comparison in place, and not creating a new data structure to house anything or for any comparison.

    You could also use a Hash Table like unordered_set, but that'd be using O(N) memory - but remain O(N) time complexity.

    I'm not entirely sure if this was an "accepted" solution by the way (what you posted, because that is creating a vector of size (sizeofA) - but just offering a solution based on your needs.

    0 讨论(0)
  • 2021-01-14 17:28

    std::vector<bool> is a bitset, so it will use n bits. In Big-O notation, O(n/8)=O(n), that means the space is not less than O(n).

    I assume they do not look at the actual program, but only measure its space consumption in some example runs. So, using a bit vector tricks it into believing that it is better than O(n).

    But I agree with you. It should not be accepted.

    0 讨论(0)
提交回复
热议问题