Algorithm to find two repeated numbers in an array, without sorting

前端 未结 24 1999
南方客
南方客 2020-11-28 06:58

There is an array of size n (numbers are between 0 and n - 3) and only 2 numbers are repeated. Elements are placed randomly in the array.

E.g. in {2, 3, 6, 1, 5, 4

相关标签:
24条回答
  • 2020-11-28 07:19

    In c:

        int arr[] = {2, 3, 6, 1, 5, 4, 0, 3, 5};
    
        int num = 0, i;
    
        for (i=0; i < 8; i++)
             num = num ^ arr[i] ^i;
    

    Since x^x=0, the numbers that are repeated odd number of times are neutralized. Let's call the unique numbers a and b.We are left with a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask ie.choose x as a power of 2 so that x & (a^b) is nonzero.

    Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that the pairs of a and b are in different buckets. So we can now apply the same method used above to each bucket independently, and discover what a and b are.

    0 讨论(0)
  • 2020-11-28 07:20

    Insert each element into a set/hashtable, first checking if its are already in it.

    0 讨论(0)
  • 2020-11-28 07:20

    What about using the https://en.wikipedia.org/wiki/HyperLogLog?

    Redis does http://redis.io/topics/data-types-intro#hyperloglogs

    A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, in the case of the Redis implementation, which is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.

    0 讨论(0)
  • 2020-11-28 07:21

    answer to 18.. you are taking an array of 9 and elements are starting from 0..so max ele will be 6 in your array. Take sum of elements from 0 to 6 and take sum of array elements. compute their difference (say d). This is p + q. Now take XOR of elements from 0 to 6 (say x1). Now take XOR of array elements (say x2). x2 is XOR of all elements from 0 to 6 except two repeated elements since they cancel out each other. now for i = 0 to 6, for each ele of array, say p is that ele a[i] so you can compute q by subtracting this ele from the d. do XOR of p and q and XOR them with x2 and check if x1==x2. likewise doing for all elements you will get the elements for which this condition will be true and you are done in O(n). Keep coding!

    0 讨论(0)
  • 2020-11-28 07:21

    How about this:

    for (i=0; i<n-1; i++) {
      for (j=i+1; j<n; j++) {
        if (a[i] == a[j]) {
            printf("%d appears more than once\n",a[i]);
            break;
        }
      }
    }
    

    Sure it's not the fastest, but it's simple and easy to understand, and requires no additional memory. If n is a small number like 9, or 100, then it may well be the "best". (i.e. "Best" could mean different things: fastest to execute, smallest memory footprint, most maintainable, least cost to develop etc..)

    0 讨论(0)
  • 2020-11-28 07:23

    Some answers to the question: Algorithm to determine if array contains n…n+m? contain as a subproblem solutions which you can adopt for your purpose.

    For example, here's a relevant part from my answer:

    bool has_duplicates(int* a, int m, int n)
    {
      /** O(m) in time, O(1) in space (for 'typeof(m) == typeof(*a) == int')
    
          Whether a[] array has duplicates.
    
          precondition: all values are in [n, n+m) range.
    
          feature: It marks visited items using a sign bit.
      */
      assert((INT_MIN - (INT_MIN - 1)) == 1); // check n == INT_MIN
      for (int *p = a; p != &a[m]; ++p) {
        *p -= (n - 1); // [n, n+m) -> [1, m+1)
        assert(*p > 0);
      }
    
      // determine: are there duplicates
      bool has_dups = false;
      for (int i = 0; i < m; ++i) {
        const int j = abs(a[i]) - 1;
        assert(j >= 0);
        assert(j < m);
        if (a[j] > 0)
          a[j] *= -1; // mark
        else { // already seen
          has_dups = true;
          break;
        }
      }
    
      // restore the array
      for (int *p = a; p != &a[m]; ++p) {
        if (*p < 0) 
          *p *= -1; // unmark
        // [1, m+1) -> [n, n+m)
        *p += (n - 1);        
      }
    
      return has_dups; 
    }
    

    The program leaves the array unchanged (the array should be writeable but its values are restored on exit).

    It works for array sizes upto INT_MAX (on 64-bit systems it is 9223372036854775807).

    0 讨论(0)
提交回复
热议问题