Finding repeating signed integers with O(n) in time and O(1) in space

后端 未结 7 498
慢半拍i
慢半拍i 2021-02-04 08:49

(This is a generalization of: Finding duplicates in O(n) time and O(1) space)

Problem: Write a C++ or C function with time and space complexities of O(n) and O(1) respec

相关标签:
7条回答
  • 2021-02-04 09:14

    Say you can use the fact you are not using all the space you have. You only need one more bit per possible value and you have lots of unused bit in your 32-bit int values.

    This has serious limitations, but works in this case. Numbers have to be between -n/2 and n/2 and if they repeat m times, they will be printed m/2 times.

    void print_repeats(long a[], unsigned size) {
        long i, val, pos, topbit = 1 << 31, mask = ~topbit;
        for (i = 0; i < size; i++)
            a[i] &= mask;
    
        for (i = 0; i < size; i++) {
            val = a[i] & mask;
            if (val <= mask/2) {
               pos = val;
            } else {
                val += topbit;
                pos = size + val;
            }
            if (a[pos] < 0) {
                printf("%d\n", val);
                a[pos] &= mask;
            } else {
                a[pos] |= topbit;
            }
        }
    }
    
    void main() {
        long a[] = {1, 0, -2, 4, 4, 1, 3, 1, -2};
        print_repeats(a, sizeof (a) / sizeof (long));
    }
    

    prints

    4
    1
    -2
    
    0 讨论(0)
  • 2021-02-04 09:17

    The definition of big-O notation is that its argument is a function (f(x)) that, as the variable in the function (x) tends to infinity, there exists a constant K such that the objective cost function will be smaller than Kf(x). Typically f is chosen to be the smallest such simple function such that the condition is satisfied. (It's pretty obvious how to lift the above to multiple variables.)

    This matters because that K — which you aren't required to specify — allows a whole multitude of complex behavior to be hidden out of sight. For example, if the core of the algorithm is O(n2), it allows all sorts of other O(1), O(logn), O(n), O(nlogn), O(n3/2), etc. supporting bits to be hidden, even if for realistic input data those parts are what actually dominate. That's right, it can be completely misleading! (Some of the fancier bignum algorithms have this property for real. Lying with mathematics is a wonderful thing.)

    So where is this going? Well, you can assume that int is a fixed size easily enough (e.g., 32-bit) and use that information to skip a lot of trouble and allocate fixed size arrays of flag bits to hold all the information that you really need. Indeed, by using two bits per potential value (one bit to say whether you've seen the value at all, another to say whether you've printed it) then you can handle the code with fixed chunk of memory of 1GB in size. That will then give you enough flag information to cope with as many 32-bit integers as you might ever wish to handle. (Heck that's even practical on 64-bit machines.) Yes, it's going to take some time to set that memory block up, but it's constant so it's formally O(1) and so drops out of the analysis. Given that, you then have constant (but whopping) memory consumption and linear time (you've got to look at each value to see whether it's new, seen once, etc.) which is exactly what was asked for.

    It's a dirty trick though. You could also try scanning the input list to work out the range allowing less memory to be used in the normal case; again, that adds only linear time and you can strictly bound the memory required as above so that's constant. Yet more trickiness, but formally legal.


    [EDIT] Sample C code (this is not C++, but I'm not good at C++; the main difference would be in how the flag arrays are allocated and managed):

    #include <stdio.h>
    #include <stdlib.h>
    
    // Bit fiddling magic
    int is(int *ary, unsigned int value) {
        return ary[value>>5] & (1<<(value&31));
    }
    void set(int *ary, unsigned int value) {
        ary[value>>5] |= 1<<(value&31);
    }
    
    // Main loop
    void print_repeats(int a[], unsigned size) {
        int *seen, *done;
        unsigned i;
    
        seen = calloc(134217728, sizeof(int));
        done = calloc(134217728, sizeof(int));
    
        for (i=0; i<size; i++) {
            if (is(done, (unsigned) a[i]))
                continue;
            if (is(seen, (unsigned) a[i])) {
                set(done, (unsigned) a[i]);
                printf("%d ", a[i]);
            } else
                set(seen, (unsigned) a[i]);
        }
    
        printf("\n");
        free(done);
        free(seen);
    }
    
    void main() {
        int a[] = {1,0,-2,4,4,1,3,1,-2};
        print_repeats(a,sizeof(a)/sizeof(int));
    }
    
    0 讨论(0)
  • 2021-02-04 09:19

    Since you have an array of integers you can use the straightforward solution with sorting the array (you didn't say it can't be modified) and printing duplicates. Integer arrays can be sorted with O(n) and O(1) time and space complexities using Radix sort. Although, in general it might require O(n) space, the in-place binary MSD radix sort can be trivially implemented using O(1) space (look here for more details).

    0 讨论(0)
  • 2021-02-04 09:20

    There is a tricky problem with definitions here. What does O(n) mean?

    Konstantin's answer claims that the radix sort time complexity is O(n). In fact it is O(n log M), where the base of the logarithm is the radix chosen, and M is the range of values that the array elements can have. So, for instance, a binary radix sort of 32-bit integers will have log M = 32.

    So this is still, in a sense, O(n), because log M is a constant independent of n. But if we allow this, then there is a much simpler solution: for each integer in the range (all 4294967296 of them), go through the array to see if it occurs more than once. This is also, in a sense, O(n), because 4294967296 is also a constant independent of n.

    I don't think my simple solution would count as an answer. But if not, then we shouldn't allow the radix sort, either.

    0 讨论(0)
  • 2021-02-04 09:28

    I doubt this is possible. Assuming there is a solution, let's see how it works. I'll try to be as general as I can and show that it can't work... So, how does it work?

    Without losing generality we could say we process the array k times, where k is fixed. The solution should also work when there are m duplicates, with m >> k. Thus, in at least one of the passes, we should be able to output x duplicates, where x grows when m grows. To do so, some useful information has been computed in a previous pass and stored in the O(1) storage. (The array itself can't be used, this would give O(n) storage.)

    The problem: we have O(1) of information, when we walk over the array we have to identify x numbers(to output them). We need a O(1) storage than can tell us in O(1) time, if an element is in it. Or said in a different way, we need a data structure to store n booleans (of wich x are true) that uses O(1) space, and takes O(1) time to query.

    Does this data structure exists? If not, then we can't find all duplicates in an array with O(n) time and O(1) space (or there is some fancy algorithm that works in a completely different manner???).

    0 讨论(0)
  • 2021-02-04 09:29

    The O(1) space constraint is intractable.

    The very fact of printing the array itself requires O(N) storage, by definition.

    Now, feeling generous, I'll give you that you can have O(1) storage for a buffer within your program and consider that the space taken outside the program is of no concern to you, and thus that the output is not an issue...

    Still, the O(1) space constraint feels intractable, because of the immutability constraint on the input array. It might not be, but it feels so.

    And your solution overflows, because you try to memorize an O(N) information in a finite datatype.

    0 讨论(0)
提交回复
热议问题