Here is one of my interview question. Given an array of N elements and where an element appears exactly N/2 times and the rest N/2 elements are unique
Similar to https://stackoverflow.com/a/1191881/199556 explanation.
Let's compare 3 elements(3 comparison operation) in worse case "same" element will appear once. So we reduce the tail by 3 and reduce count of "same" elements by one.
In final step(after k iterations) our tail will contain (n/2) - k "same" elements. Let's compare the length of the tail.
On the one hand it will n-3k on the other hand (n/2) - k + 1. Last unsame elements may exist.
n-3k = (n/2) - k + 1
k = 1/4*(n-2)
After k iterations we'll surely get result.
Number of comparisons 3/4*(n-2)
This is a poor interview question.
Mostly because of the first one. What are you looking for? That the candidate should come up with this O(log n) solution you don't know exists? If you have to ask StackOverflow, is this something you can reasonably expect a candidate to come up with in an interview?
My Answer was,
Runtime - O(N)
First off, it's past my bed time and I should know better than to post code in public without trying it first, yada, yada. I hope the criticism I'll get will at least be educational. :-)
I believe the problem can be restated as: "Find the number that occurs more than once."
In the absolute worst case, we would need to iterate through a little more than half the list (1 + N/2) before we found the 2nd instance of a non-unique number.
Worst case example: array [] = { 1, 2, 3, 4, 5, 10, 10, 10, 10, 10 }
On average though, we'd only need to iterate though 3 or 4 elements since half of the elements will contain the non-unique number i.e roughly every other number.
Perfectly even distribution examples:
In other words even if N = 1 million you would still only need to search; on average, the first 3 or 4 elements before you discovered a duplicate.
What's the big O notation for a fixed/constant runtime that doesn't increase with N?
Code:
int foundAt = -1;
for (int i=0; (i<N) && (foundAt==-1); i++)
{
for (int j=i+1; j<N; j++)
{
if (array[i] == array[j])
{
foundAt = i;
break;
}
}
}
int uniqueNumber = array[foundAt];
Here is my attempt at a proof of why this cannot be done in less than O(n) array accesses (for worst case, which surely is the only interesting case in this example):
Assume a worst case log(n) algorithm exists. This algorithm accesses the array at most log(n) times. Since it can make no assumptions about which elements are where, let me choose which log(n) elements it sees. I will choose to give it the first log(n) unique elements. It has not found the duplicate yet, and there still exist n/2 - log(n) unique elements for me to feed it if need be. In fact, I cannot be forced to feed it a duplicated number until it has read n/2 elements. Therefore such an algorithm cannot exist.
From a purely intuitive standpoint, this just seems impossible. Log(4 billion) is 32. So with an array of 4 billion numbers, 2 billion of which are unique, in no particular order, there is a way to find the duplicated element by only checking 32 elements?
Contrary to answers above, there is a solution with worst case behavior as requested, O(log n) RUN TIME. The problem is not to find a solution with O(log N) comparisons worst case (which is impossible), but to do it O(log N) time.
If you can do N comparisons in parallel, the solution is a trivial divide-and-conquer. Not very practical in the real world, but it's an interview question, not a real-world problem.
Update: I think you can do it in constant time with O(N) processors