Easy interview question got harder: given numbers 1..100, find the missing number(s) given exactly k are missing

前端 未结 30 1219
时光说笑
时光说笑 2020-11-22 07:02

I had an interesting job interview experience a while back. The question started really easy:

Q1: We have a bag containing numbers

相关标签:
30条回答
  • 2020-11-22 08:02

    The problem with solutions based on sums of numbers is they don't take into account the cost of storing and working with numbers with large exponents... in practice, for it to work for very large n, a big numbers library would be used. We can analyse the space utilisation for these algorithms.

    We can analyse the time and space complexity of sdcvvc and Dimitris Andreou's algorithms.

    Storage:

    l_j = ceil (log_2 (sum_{i=1}^n i^j))
    l_j > log_2 n^j  (assuming n >= 0, k >= 0)
    l_j > j log_2 n \in \Omega(j log n)
    
    l_j < log_2 ((sum_{i=1}^n i)^j) + 1
    l_j < j log_2 (n) + j log_2 (n + 1) - j log_2 (2) + 1
    l_j < j log_2 n + j + c \in O(j log n)`
    

    So l_j \in \Theta(j log n)

    Total storage used: \sum_{j=1}^k l_j \in \Theta(k^2 log n)

    Space used: assuming that computing a^j takes ceil(log_2 j) time, total time:

    t = k ceil(\sum_i=1^n log_2 (i)) = k ceil(log_2 (\prod_i=1^n (i)))
    t > k log_2 (n^n + O(n^(n-1)))
    t > k log_2 (n^n) = kn log_2 (n)  \in \Omega(kn log n)
    t < k log_2 (\prod_i=1^n i^i) + 1
    t < kn log_2 (n) + 1 \in O(kn log n)
    

    Total time used: \Theta(kn log n)

    If this time and space is satisfactory, you can use a simple recursive algorithm. Let b!i be the ith entry in the bag, n the number of numbers before removals, and k the number of removals. In Haskell syntax...

    let
      -- O(1)
      isInRange low high v = (v >= low) && (v <= high)
      -- O(n - k)
      countInRange low high = sum $ map (fromEnum . isInRange low high . (!)b) [1..(n-k)]
      findMissing l low high krange
        -- O(1) if there is nothing to find.
        | krange=0 = l
        -- O(1) if there is only one possibility.
        | low=high = low:l
        -- Otherwise total of O(knlog(n)) time
        | otherwise =
           let
             mid = (low + high) `div` 2
             klow = countInRange low mid
             khigh = krange - klow
           in
             findMissing (findMissing low mid klow) (mid + 1) high khigh
    in
      findMising 1 (n - k) k
    

    Storage used: O(k) for list, O(log(n)) for stack: O(k + log(n)) This algorithm is more intuitive, has the same time complexity, and uses less space.

    0 讨论(0)
  • 2020-11-22 08:02

    Very nice problem. I'd go for using a set difference for Qk. A lot of programming languages even have support for it, like in Ruby:

    missing = (1..100).to_a - bag
    

    It's probably not the most efficient solution but it's one I would use in real life if I was faced with such a task in this case (known boundaries, low boundaries). If the set of number would be very large then I would consider a more efficient algorithm, of course, but until then the simple solution would be enough for me.

    0 讨论(0)
  • 2020-11-22 08:02

    This might sound stupid, but, in the first problem presented to you, you would have to see all the remaining numbers in the bag to actually add them up to find the missing number using that equation.

    So, since you get to see all the numbers, just look for the number that's missing. The same goes for when two numbers are missing. Pretty simple I think. No point in using an equation when you get to see the numbers remaining in the bag.

    0 讨论(0)
  • 2020-11-22 08:04

    You'd probably need clarification on what O(k) means.

    Here's a trivial solution for arbitrary k: for each v in your set of numbers, accumulate the sum of 2^v. At the end, loop i from 1 to N. If sum bitwise ANDed with 2^i is zero, then i is missing. (Or numerically, if floor of the sum divided by 2^i is even. Or sum modulo 2^(i+1)) < 2^i.)

    Easy, right? O(N) time, O(1) storage, and it supports arbitrary k.

    Except that you're computing enormous numbers that on a real computer would each require O(N) space. In fact, this solution is identical to a bit vector.

    So you could be clever and compute the sum and the sum of squares and the sum of cubes... up to the sum of v^k, and do the fancy math to extract the result. But those are big numbers too, which begs the question: what abstract model of operation are we talking about? How much fits in O(1) space, and how long does it take to sum up numbers of whatever size you need?

    0 讨论(0)
  • 2020-11-22 08:04

    I have read all thirty answers and found the simplest one i.e to use a bit array of 100 to be the best. But as the question said we can't use an array of size N, I would use O(1) space complexity and k iterations i.e O(NK) time complexity to solve this.

    To make the explanation simpler, consider I have been given numbers from 1 to 15 and two of them are missing i.e 9 and 14 but I don't know. Let the bag look like this:

    [8,1,2,12,4,7,5,10,11,13,15,3,6].

    We know that each number is represented internally in the form of bits. For numbers till 16 we only need 4 bits. For numbers till 10^9, we will need 32 bits. But let's focus on 4 bits and then later we can generalize it.

    Now, assume if we had all the numbers from 1 to 15, then internally, we would have numbers like this (if we had them ordered):

    0001
    0010
    0011
    0100
    0101
    0110
    0111
    1000
    1001
    1010
    1011
    1100
    1101
    1110
    1111
    

    But now we have two numbers missing. So our representation will look something like this (shown ordered for understanding but can be in any order):

    (2MSD|2LSD)
    00|01
    00|10
    00|11
    -----
    01|00
    01|01
    01|10
    01|11
    -----
    10|00
    missing=(10|01) 
    10|10
    10|11
    -----
    11|00
    11|01
    missing=(11|10)
    11|11
    

    Now let's make a bit array of size 2 that holds the count of numbers with corresponding 2 most significant digits. i.e

    = [__,__,__,__] 
       00,01,10,11
    

    Scan the bag from left and right and fill the above array such that each of bin of bit array contains the count of numbers. The result will be as under:

    = [ 3, 4, 3, 3] 
       00,01,10,11
    

    If all the numbers would have been present, it would have looked like this:

    = [ 3, 4, 4, 4] 
       00,01,10,11
    

    Thus we know that there are two numbers missing: one whose most 2 significant digits are 10 and one whose most 2 significant bits are 11. Now scan the list again and fill out a bit array of size 2 for the lower 2 significant digits. This time, only consider elements whose most 2 significant digits are 10. We will have the bit array as:

    = [ 1, 0, 1, 1] 
       00,01,10,11
    

    If all numbers of MSD=10 were present, we would have 1 in all the bins but now we see that one is missing. Thus we have the number whose MSD=10 and LSD=01 is missing which is 1001 i.e 9.

    Similarly, if we scan again but consider only elements whose MSD=11,we get MSD=11 and LSD=10 missing which is 1110 i.e 14.

    = [ 1, 0, 1, 1] 
       00,01,10,11
    

    Thus, we can find the missing numbers in a constant amount of space. We can generalize this for 100, 1000 or 10^9 or any set of numbers.

    References: Problem 1.6 in http://users.ece.utexas.edu/~adnan/afi-samples-new.pdf

    0 讨论(0)
  • 2020-11-22 08:05

    I haven't checked the maths, but I suspect that computing Σ(n^2) in the same pass as we compute Σ(n) would provide enough info to get two missing numbers, Do Σ(n^3) as well if there are three, and so on.

    0 讨论(0)
提交回复
热议问题