Algorithm to determine if array contains n…n+m?

前端未结

关注

 30  2973

I saw this question on Reddit, and there were no positive solutions presented, and I thought it would be a perfect question to ask here. This was in a thread about interview

Counter-example for XOR algorithm.

(can't post it as a comment)

@popopome

For a = {0, 2, 7, 5,} it return true (means that a is a permutation of the range [0, 4) ), but it must return false in this case (a is obviously is not a permutaton of [0, 4) ).

Another counter example: {0, 0, 1, 3, 5, 6, 6} -- all values are in range but there are duplicates.

I could incorrectly implement popopome's idea (or tests), therefore here is the code:

bool isperm_popopome(int m; int a[m], int m, int  n)
{
  /** O(m) in time (single pass), O(1) in space,
      no restrictions on n,
      no overflow,
      a[] may be readonly
  */
  int even_xor = 0;
  int odd_xor  = 0;

  for (int i = 0; i < m; ++i)
    {
      if (a[i] % 2 == 0) // is even
        even_xor ^= a[i];
      else
        odd_xor ^= a[i];

      const int b = i + n;
      if (b % 2 == 0)    // is even
        even_xor ^= b;
      else
        odd_xor ^= b;
    }

  return (even_xor == 0) && (odd_xor == 0);
}

0 讨论(0)

眼角桃花

2020-11-28 01:56
Under the assumption numbers less than one are not allowed and there are no duplicates, there is a simple summation identity for this - the sum of numbers from 1 to m in increments of 1 is (m * (m + 1)) / 2. You can then sum the array and use this identity.

You can find out if there is a dupe under the above guarantees, plus the guarantee no number is above m or less than n (which can be checked in O(N))

The idea in pseudo-code:
0) Start at N = 0
1) Take the N-th element in the list.
2) If it is not in the right place if the list had been sorted, check where it should be.
3) If the place where it should be already has the same number, you have a dupe - RETURN TRUE
4) Otherwise, swap the numbers (to put the first number in the right place).
5) With the number you just swapped with, is it in the right place?
6) If no, go back to step two.
7) Otherwise, start at step one with N = N + 1. If this would be past the end of the list, you have no dupes.

And, yes, that runs in O(N) although it may look like O(N ^ 2)

Note to everyone (stuff collected from comments)

This solution works under the assumption you can modify the array, then uses in-place Radix sort (which achieves O(N) speed).

Other mathy-solutions have been put forth, but I'm not sure any of them have been proved. There are a bunch of sums that might be useful, but most of them run into a blowup in the number of bits required to represent the sum, which will violate the constant extra space guarantee. I also don't know if any of them are capable of producing a distinct number for a given set of numbers. I think a sum of squares might work, which has a known formula to compute it (see Wolfram's)

New insight (well, more of musings that don't help solve it but are interesting and I'm going to bed):

So, it has been mentioned to maybe use sum + sum of squares. No one knew if this worked or not, and I realized that it only becomes an issue when (x + y) = (n + m), such as the fact 2 + 2 = 1 + 3. Squares also have this issue thanks to Pythagorean triples (so 3^2 + 4^2 + 25^2 == 5^2 + 7^2 + 24^2, and the sum of squares doesn't work). If we use Fermat's last theorem, we know this can't happen for n^3. But we also don't know if there is no x + y + z = n for this (unless we do and I don't know it). So no guarantee this, too, doesn't break - and if we continue down this path we quickly run out of bits.

In my glee, however, I forgot to note that you can break the sum of squares, but in doing so you create a normal sum that isn't valid. I don't think you can do both, but, as has been noted, we don't have a proof either way.

I must say, finding counterexamples is sometimes a lot easier than proving things! Consider the following sequences, all of which have a sum of 28 and a sum of squares of 140:
```
[1, 2, 3, 4, 5, 6, 7]
[1, 1, 4, 5, 5, 6, 6] 
[2, 2, 3, 3, 4, 7, 7]
```
I could not find any such examples of length 6 or less. If you want an example that has the proper min and max values too, try this one of length 8:
```
[1, 3, 3, 4, 4, 5, 8, 8]
```
Simpler approach (modifying hazzen's idea):

An integer array of length m contains all the numbers from n to n+m-1 exactly once iff
- every array element is between n and n+m-1
- there are no duplicates
(Reason: there are only m values in the given integer range, so if the array contains m unique values in this range, it must contain every one of them once)

If you are allowed to modify the array, you can check both in one pass through the list with a modified version of hazzen's algorithm idea (there is no need to do any summation):
- For all array indexes i from 0 to m-1 do
  1. If array[i] < n or array[i] >= n+m => RETURN FALSE ("value out of range found")
  2. Calculate j = array[i] - n (this is the 0-based position of array[i] in a sorted array with values from n to n+m-1)
  3. While j is not equal to i
    1. If list[i] is equal to list[j] => RETURN FALSE ("duplicate found")
    2. Swap list[i] with list[j]
    3. Recalculate j = array[i] - n
- RETURN TRUE
I'm not sure if the modification of the original array counts against the maximum allowed additional space of O(1), but if it doesn't this should be the solution the original poster wanted.
0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2020-11-28 01:57
I like Greg Hewgill's idea of Radix sorting. To find duplicates, you can sort in O(N) time given the constraints on the values in this array.

For an in-place O(1) space O(N) time that restores the original ordering of the list, you don't have to do an actual swap on that number; you can just mark it with a flag:
```
//Java: assumes all numbers in arr > 1
boolean checkArrayConsecutiveRange(int[] arr) {

// find min/max
int min = arr[0]; int max = arr[0]
for (int i=1; i<arr.length; i++) {
    min = (arr[i] < min ? arr[i] : min);
    max = (arr[i] > max ? arr[i] : max);
}
if (max-min != arr.length) return false;

// flag and check
boolean ret = true;
for (int i=0; i<arr.length; i++) {
    int targetI = Math.abs(arr[i])-min;
    if (arr[targetI] < 0) {
        ret = false; 
        break;
    }
    arr[targetI] = -arr[targetI];
}
for (int i=0; i<arr.length; i++) {
    arr[i] = Math.abs(arr[i]);
}

return ret;
}
```
Storing the flags inside the given array is kind of cheating, and doesn't play well with parallelization. I'm still trying to think of a way to do it without touching the array in O(N) time and O(log N) space. Checking against the sum and against the sum of least squares (arr[i] - arr.length/2.0)^2 feels like it might work. The one defining characteristic we know about a 0...m array with no duplicates is that it's uniformly distributed; we should just check that.

Now if only I could prove it.

I'd like to note that the solution above involving factorial takes O(N) space to store the factorial itself. N! > 2^N, which takes N bytes to store.
0 讨论(0)
发布评论:

提交评论
- 加载中...
-上瘾入骨i

2020-11-28 01:57
ciphwn has it right. It is all to do with statistics. What the question is asking is, in statistical terms, is whether or not the sequence of numbers form a discrete uniform distribution. A discrete uniform distribution is where all values of a finite set of possible values are equally probable. Fortunately there are some useful formulas to determine if a discrete set is uniform. Firstly, to determine the mean of the set (a..b) is (a+b)/2 and the variance is (n.n-1)/12. Next, determine the variance of the given set:
```
variance = sum [i=1..n] (f(i)-mean).(f(i)-mean)/n
```
and then compare with the expected variance. This will require two passes over the data, once to determine the mean and again to calculate the variance.

References:
- uniform discrete distribution
- variance
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉梦人生

2020-11-28 01:59

Why do the other solutions use a summation of every value? I think this is risky, because when you add together O(n) items into one number, you're technically using more than O(1) space.

O(1) indicates constant space which does not change by the number of n. It does not matter if it is 1 or 2 variables as long as it is a constant number. Why are you saying it is more than O(1) space? If you are calculating the sum of n numbers by accumulating it in a temporary variable, you would be using exactly 1 variable anyway.

Commenting in an answer because the system does not allow me to write comments yet.

Update (in reply to comments): in this answer i meant O(1) space wherever "space" or "time" was omitted. The quoted text is a part of an earlier answer to which this is a reply to.

0 讨论(0)
发布评论:

提交评论
- 加载中...

小蘑菇

2020-11-28 02:00

Any contiguous array [ n, n+1, ..., n+m-1 ] can be mapped on to a 'base' interval [ 0, 1, ..., m ] using the modulo operator. For each i in the interval, there is exactly one i%m in the base interval and vice versa.

Any contiguous array also has a 'span' m (maximum - minimum + 1) equal to it's size.

Using these facts, you can create an "encountered" boolean array of same size containing all falses initially, and while visiting the input array, put their related "encountered" elements to true.

This algorithm is O(n) in space, O(n) in time, and checks for duplicates.

def contiguous( values )
    #initialization
    encountered = Array.new( values.size, false )
    min, max = nil, nil
    visited = 0

    values.each do |v|

        index = v % encountered.size

        if( encountered[ index ] )
            return "duplicates"; 
        end

        encountered[ index ] = true
        min = v if min == nil or v < min
        max = v if max == nil or v > max 
        visited += 1
    end

    if ( max - min + 1 != values.size ) or visited != values.size
        return "hole"
    else
        return "contiguous"
    end

end

tests = [ 
[ false, [ 2,4,5,6 ] ], 
[ false, [ 10,11,13,14 ] ] , 
[ true , [ 20,21,22,23 ] ] , 
[ true , [ 19,20,21,22,23 ] ] ,
[ true , [ 20,21,22,23,24 ] ] ,
[ false, [ 20,21,22,23,24+5 ] ] ,
[ false, [ 2,2,3,4,5 ] ]
]

tests.each do |t|
    result = contiguous( t[1] )
    if( t[0] != ( result == "contiguous" ) )
        puts "Failed Test : " + t[1].to_s + " returned " + result
    end
end

0 讨论(0)

Algorithm to determine if array contains n…n+m?

Counter-example for XOR algorithm.

Note to everyone (stuff collected from comments)

New insight (well, more of musings that don't help solve it but are interesting and I'm going to bed):

Simpler approach (modifying hazzen's idea):