I had an interesting job interview experience a while back. The question started really easy:
Q1: We have a bag containing numbers
You can motivate the solution by thinking about it in terms of symmetries (groups, in math language). No matter the order of the set of numbers, the answer should be the same. If you're going to use k
functions to help determine the missing elements, you should be thinking about what functions have that property: symmetric. The function s_1(x) = x_1 + x_2 + ... + x_n
is an example of a symmetric function, but there are others of higher degree. In particular, consider the elementary symmetric functions. The elementary symmetric function of degree 2 is s_2(x) = x_1 x_2 + x_1 x_3 + ... + x_1 x_n + x_2 x_3 + ... + x_(n-1) x_n
, the sum of all products of two elements. Similarly for the elementary symmetric functions of degree 3 and higher. They are obviously symmetric. Furthermore, it turns out they are the building blocks for all symmetric functions.
You can build the elementary symmetric functions as you go by noting that s_2(x,x_(n+1)) = s_2(x) + s_1(x)(x_(n+1))
. Further thought should convince you that s_3(x,x_(n+1)) = s_3(x) + s_2(x)(x_(n+1))
and so on, so they can be computed in one pass.
How do we tell which items were missing from the array? Think about the polynomial (z-x_1)(z-x_2)...(z-x_n)
. It evaluates to 0
if you put in any of the numbers x_i
. Expanding the polynomial, you get z^n-s_1(x)z^(n-1)+ ... + (-1)^n s_n
. The elementary symmetric functions appear here too, which is really no surprise, since the polynomial should stay the same if we apply any permutation to the roots.
So we can build the polynomial and try to factor it to figure out which numbers are not in the set, as others have mentioned.
Finally, if we are concerned about overflowing memory with large numbers (the nth symmetric polynomial will be of the order 100!
), we can do these calculations mod p
where p
is a prime bigger than 100. In that case we evaluate the polynomial mod p
and find that it again evaluates to 0
when the input is a number in the set, and it evaluates to a non-zero value when the input is a number not in the set. However, as others have pointed out, to get the values out of the polynomial in time that depends on k
, not N
, we have to factor the polynomial mod p
.