I was recently given this interview question and I\'m curious what a good solution to it would be.
Say I\'m given a 2d array where all the numbers i
Here's a simple approach:
For an NxM
array, this runs in O(N+M)
. I think it would be difficult to do better. :)
Edit: Lots of good discussion. I was talking about the general case above; clearly, if N
or M
are small, you could use a binary search approach to do this in something approaching logarithmic time.
Here are some details, for those who are curious:
This simple algorithm is called a Saddleback Search. It's been around for a while, and it is optimal when N == M
. Some references:
However, when N < M
, intuition suggests that binary search should be able to do better than O(N+M)
: For example, when N == 1
, a pure binary search will run in logarithmic rather than linear time.
Richard Bird examined this intuition that binary search could improve the Saddleback algorithm in a 2006 paper:
Using a rather unusual conversational technique, Bird shows us that for N <= M
, this problem has a lower bound of Ω(N * log(M/N))
. This bound make sense, as it gives us linear performance when N == M
and logarithmic performance when N == 1
.
One approach that uses a row-by-row binary search looks like this:
N < M
. Let's say N
is rows and M
is columns.value
. If we find it, we're done.s
and g
, where s < value < g
.s
is less than value
, so we can eliminate it.g
is greater than value
, so we can eliminate it.In terms of worst-case complexity, this algorithm does log(M)
work to eliminate half the possible solutions, and then recursively calls itself twice on two smaller problems. We do have to repeat a smaller version of that log(M)
work for every row, but if the number of rows is small compared to the number of columns, then being able to eliminate all of those columns in logarithmic time starts to become worthwhile.
This gives the algorithm a complexity of T(N,M) = log(M) + 2 * T(M/2, N/2)
, which Bird shows to be O(N * log(M/N))
.
Another approach posted by Craig Gidney describes an algorithm similar the approach above: it examines a row at a time using a step size of M/N
. His analysis shows that this results in O(N * log(M/N))
performance as well.
Big-O analysis is all well and good, but how well do these approaches work in practice? The chart below examines four algorithms for increasingly "square" arrays:
(The "naive" algorithm simply searches every element of the array. The "recursive" algorithm is described above. The "hybrid" algorithm is an implementation of Gidney's algorithm. For each array size, performance was measured by timing each algorithm over fixed set of 1,000,000 randomly-generated arrays.)
Some notable points:
Clever use of binary search can provide O(N * log(M/N)
performance for both rectangular and square arrays. The O(N + M)
"saddleback" algorithm is much simpler, but suffers from performance degradation as arrays become increasingly rectangular.