I was recently given this interview question and I\'m curious what a good solution to it would be.
Say I\'m given a 2d array where all the numbers i
Interesting question. Consider this idea - create one boundary where all the numbers are greater than your target and another where all the numbers are less than your target. If anything is left in between the two, that's your target.
If I'm looking for 3 in your example, I read across the first row until I hit 4, then look for the smallest adjacent number (including diagonals) greater than 3:
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11
Now I do the same for those numbers less than 3:
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11
Now I ask, is anything inside the two boundaries? If yes, it must be 3. If no, then there is no 3. Sort of indirect since I don't actually find the number, I just deduce that it must be there. This has the added bonus of counting ALL the 3's.
I tried this on some examples and it seems to work OK.
The optimal solution is to start at the top-left corner, that has minimal value. Move diagonally downwards to the right until you hit an element whose value >= value of the given element. If the element's value is equal to that of the given element, return found as true.
Otherwise, from here we can proceed in two ways.
Strategy 1:
Strategy 2: Let i denote the row index and j denote the column index of the diagonal element we have stopped at. (Here, we have i = j, BTW). Let k = 1.
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11
public boolean searchSortedMatrix(int arr[][] , int key , int minX , int maxX , int minY , int maxY){
// base case for recursion
if(minX > maxX || minY > maxY)
return false ;
// early fails
// array not properly intialized
if(arr==null || arr.length==0)
return false ;
// arr[0][0]> key return false
if(arr[minX][minY]>key)
return false ;
// arr[maxX][maxY]<key return false
if(arr[maxX][maxY]<key)
return false ;
//int temp1 = minX ;
//int temp2 = minY ;
int midX = (minX+maxX)/2 ;
//if(temp1==midX){midX+=1 ;}
int midY = (minY+maxY)/2 ;
//if(temp2==midY){midY+=1 ;}
// arr[midX][midY] = key ? then value found
if(arr[midX][midY] == key)
return true ;
// alas ! i have to keep looking
// arr[midX][midY] < key ? search right quad and bottom matrix ;
if(arr[midX][midY] < key){
if( searchSortedMatrix(arr ,key , minX,maxX , midY+1 , maxY))
return true ;
// search bottom half of matrix
if( searchSortedMatrix(arr ,key , midX+1,maxX , minY , maxY))
return true ;
}
// arr[midX][midY] > key ? search left quad matrix ;
else {
return(searchSortedMatrix(arr , key , minX,midX-1,minY,midY-1));
}
return false ;
}
I would use the divide-and-conquer strategy for this problem, similar to what you suggested, but the details are a bit different.
This will be a recursive search on subranges of the matrix.
At each step, pick an element in the middle of the range. If the value found is what you are seeking, then you're done.
Otherwise, if the value found is less than the value that you are seeking, then you know that it is not in the quadrant above and to the left of your current position. So recursively search the two subranges: everything (exclusively) below the current position, and everything (exclusively) to the right that is at or above the current position.
Otherwise, (the value found is greater than the value that you are seeking) you know that it is not in the quadrant below and to the right of your current position. So recursively search the two subranges: everything (exclusively) to the left of the current position, and everything (exclusively) above the current position that is on the current column or a column to the right.
And ba-da-bing, you found it.
Note that each recursive call only deals with the current subrange only, not (for example) ALL rows above the current position. Just those in the current subrange.
Here's some pseudocode for you:
bool numberSearch(int[][] arr, int value, int minX, int maxX, int minY, int maxY)
if (minX == maxX and minY == maxY and arr[minX,minY] != value)
return false
if (arr[minX,minY] > value) return false; // Early exits if the value can't be in
if (arr[maxX,maxY] < value) return false; // this subrange at all.
int nextX = (minX + maxX) / 2
int nextY = (minY + maxY) / 2
if (arr[nextX,nextY] == value)
{
print nextX,nextY
return true
}
else if (arr[nextX,nextY] < value)
{
if (numberSearch(arr, value, minX, maxX, nextY + 1, maxY))
return true
return numberSearch(arr, value, nextX + 1, maxX, minY, nextY)
}
else
{
if (numberSearch(arr, value, minX, nextX - 1, minY, maxY))
return true
reutrn numberSearch(arr, value, nextX, maxX, minY, nextY)
}
This is a short proof of the lower bound on the problem.
You cannot do it better than linear time (in terms of array dimensions, not the number of elements). In the array below, each of the elements marked as *
can be either 5 or 6 (independently of other ones). So if your target value is 6 (or 5) the algorithm needs to examine all of them.
1 2 3 4 *
2 3 4 * 7
3 4 * 7 8
4 * 7 8 9
* 7 8 9 10
Of course this expands to bigger arrays as well. This means that this answer is optimal.
Update: As pointed out by Jeffrey L Whitledge, it is only optimal as the asymptotic lower bound on running time vs input data size (treated as a single variable). Running time treated as two-variable function on both array dimensions can be improved.
This problem takes Θ(b lg(t))
time, where b = min(w,h)
and t=b/max(w,h)
. I discuss the solution in this blog post.
Lower bound
An adversary can force an algorithm to make Ω(b lg(t))
queries, by restricting itself to the main diagonal:
Legend: white cells are smaller items, gray cells are larger items, yellow cells are smaller-or-equal items and orange cells are larger-or-equal items. The adversary forces the solution to be whichever yellow or orange cell the algorithm queries last.
Notice that there are b
independent sorted lists of size t
, requiring Ω(b lg(t))
queries to completely eliminate.
Algorithm
w >= h
)t
to the left of the top right corner of the valid area
t
cells in the row with a binary search. If a matching item is found while doing this, return with its position.t
short columns.Finding an item:
Determining an item doesn't exist:
Legend: white cells are smaller items, gray cells are larger items, and the green cell is an equal item.
Analysis
There are b*t
short columns to eliminate. There are b
long rows to eliminate. Eliminating a long row costs O(lg(t))
time. Eliminating t
short columns costs O(1)
time.
In the worst case we'll have to eliminate every column and every row, taking time O(lg(t)*b + b*t*1/t) = O(b lg(t))
.
Note that I'm assuming lg
clamps to a result above 1 (i.e. lg(x) = log_2(max(2,x))
). That's why when w=h
, meaning t=1
, we get the expected bound of O(b lg(1)) = O(b) = O(w+h)
.
Code
public static Tuple<int, int> TryFindItemInSortedMatrix<T>(this IReadOnlyList<IReadOnlyList<T>> grid, T item, IComparer<T> comparer = null) {
if (grid == null) throw new ArgumentNullException("grid");
comparer = comparer ?? Comparer<T>.Default;
// check size
var width = grid.Count;
if (width == 0) return null;
var height = grid[0].Count;
if (height < width) {
var result = grid.LazyTranspose().TryFindItemInSortedMatrix(item, comparer);
if (result == null) return null;
return Tuple.Create(result.Item2, result.Item1);
}
// search
var minCol = 0;
var maxRow = height - 1;
var t = height / width;
while (minCol < width && maxRow >= 0) {
// query the item in the minimum column, t above the maximum row
var luckyRow = Math.Max(maxRow - t, 0);
var cmpItemVsLucky = comparer.Compare(item, grid[minCol][luckyRow]);
if (cmpItemVsLucky == 0) return Tuple.Create(minCol, luckyRow);
// did we eliminate t rows from the bottom?
if (cmpItemVsLucky < 0) {
maxRow = luckyRow - 1;
continue;
}
// we eliminated most of the current minimum column
// spend lg(t) time eliminating rest of column
var minRowInCol = luckyRow + 1;
var maxRowInCol = maxRow;
while (minRowInCol <= maxRowInCol) {
var mid = minRowInCol + (maxRowInCol - minRowInCol + 1) / 2;
var cmpItemVsMid = comparer.Compare(item, grid[minCol][mid]);
if (cmpItemVsMid == 0) return Tuple.Create(minCol, mid);
if (cmpItemVsMid > 0) {
minRowInCol = mid + 1;
} else {
maxRowInCol = mid - 1;
maxRow = mid - 1;
}
}
minCol += 1;
}
return null;
}