If the lengths of two arrays (say, A
has N
elements and B
has M
elements) are similar, then the best approach would be to perform linear search of one array's elements in another array. Of course, since the arrays are sorted, the next search should begin where the previous search has stopped. This is the classic principle used in "sorted array merge" algorithm. The complexity on O(N + M)
.
If the lengths are significantly different (say, M << N
), then a much more optimal approach would be to iterate through elements of the shorter array and use binary search to look for these values in the longer array. The complexity is O(M * log N)
in that case.
As you can see O(M * log N)
is better than O(N + M)
if M
is much smaller than N
, and worse otherwise.
The difference in array sizes which should trigger the switch from one approach to another depends on some practical considerations. If should be chosen based on practical experiments with your data.
These two approaches (linear and binary searches) can be "blended" into a single algorithm. Let's assume M <= N
. In that case let's choose step value S = [N / M]
. You take first element from array A
and perform a straddled linear search for that element in array B
with step S
, meaning that you check elements B[0], B[S], B[2*S], B[3*S], ...
and so on. Once you find the index range [S*i, S*(i+1)]
that potentially contains the element you are searching for, you switch to binary search inside that segment of array B
. Done. The straddled linear search for the next element of A
begins where the previous search left off. (As a side note, it might make sense to choose the value of S
equal to a power of 2).
This "blended" algorithm is the most asymptotically optimal search/merge algorithm for two sorted arrays in existence. However, in practice the more simple approach with choosing either binary or linear search depending on relative sizes of the arrays works perfectly well.