GallopSearch Merge: O(log(n)*log(i)) rather than O(n)
I went ahead and implemented greybeard suggestion in the comments. Mostly because I needed a highly efficient mission critical version of this code.
- The code uses a gallopSearch which is O(log(i)) where i is the
distance from the current index the relevant index exists.
- The code uses a binarySearch for after the gallop search has
identified the proper,range. Since gallop limited this to a smaller
range the resulting binarySearch is also O(log(i))
- The gallop and merge are performed backwards. This doesn't seem
mission critical but it allows in place merging of arrays. If one of
your arrays has enough room to store the results values, you can
simply use it as the merging array and the results array. You must specify the valid range within the array in such a case.
- It does not require memory allocation in that case (big savings in critical operations). It simply makes sure it doesn't and cannot overwrite any unprocessed values (which can only be done backwards). In fact, you use the same array for both of the inputs and the results. It will suffer no ill effects.
- I consistently used Integer.compare() so this could be switched out for other purposes.
- There's some chance I might have goofed a little and not utilized information I have previously proven. Such as binary searching into a range of two values, for which one value was already checked. There might also be a better way to state the main loop, the flipping c value wouldn't be needed if they were combined into two operations in sequence. Since you know you will do one then the other everytime. There's room for for some polish.
This should be the most efficient way to do this, with time complexity of O(log(n)*log(i)) rather than O(n). And worst case time complexity of O(n). If your arrays are clumpy and have long strings of values together, this will dwarf any other way to do it, otherwise it'll just be better than them.
It has two read values at the ends of the merging array and the write value within the results array. After finding out which is end value is less, it does a gallop search into that array. 1, 2, 4, 8, 16, 32, etc. When it finds the range where the the other array's read value is bigger. It binary searches into that range (cuts the range in half, search the correct half, repeat until single value). Then it array copies those values into the write position. Keeping in mind that the copy is, by necessity, moved such that it cannot overwrite the same values from the either reading array (which means the write array and read array can be the same). It then performs the same operation for the other array which is now known to be less than the new read value of the other array.
static public int gallopSearch(int current, int[] array, int v) {
int d = 1;
int seek = current - d;
int prevIteration = seek;
while (seek > 0) {
if (Integer.compare(array[seek], v) <= 0) {
break;
}
prevIteration = seek;
d <<= 1;
seek = current - d;
if (seek < 0) {
seek = 0;
}
}
if (prevIteration != seek) {
seek = binarySearch(array, seek, prevIteration, v);
seek = seek >= 0 ? seek : ~seek;
}
return seek;
}
static public int binarySearch(int[] list, int fromIndex, int toIndex, int v) {
int low = fromIndex;
int high = toIndex - 1;
while (low <= high) {
int mid = (low + high) >>> 1;
int midVal = list[mid];
int cmp = Integer.compare(midVal, v);
if (cmp < 0) {
low = mid + 1;
} else if (cmp > 0) {
high = mid - 1;
} else {
return mid;// key found
}
}
return -(low + 1);// key not found.
}
static public int[] sortedArrayMerge(int[] a, int[] b) {
return sortedArrayMerge(null, a, a.length, b, b.length);
}
static public int[] sortedArrayMerge(int[] results, int[] a, int aRead, int b[], int bRead) {
int write = aRead + bRead, length, gallopPos;
if ((results == null) || (results.length < write)) {
results = new int[write];
}
if (aRead > 0 && bRead > 0) {
int c = Integer.compare(a[aRead - 1], b[bRead - 1]);
while (aRead > 0 && bRead > 0) {
switch (c) {
default:
gallopPos = gallopSearch(aRead, a, b[bRead-1]);
length = (aRead - gallopPos);
write -= length;
aRead = gallopPos;
System.arraycopy(a, gallopPos--, results, write, length);
c = -1;
break;
case -1:
gallopPos = gallopSearch(bRead, b, a[aRead-1]);
length = (bRead - gallopPos);
write -= length;
bRead = gallopPos;
System.arraycopy(b, gallopPos--, results, write, length);
c = 1;
break;
}
}
}
if (bRead > 0) {
if (b != results) {
System.arraycopy(b, 0, results, 0, bRead);
}
} else if (aRead > 0) {
if (a != results) {
System.arraycopy(a, 0, results, 0, aRead);
}
}
return results;
}
This should be the most efficient way to do it.
Some answers had a duplicate remove ability. That'll require an O(n) algorithm because you must actually compare each item. So here's a stand-alone for that, to be applied after the fact. You can't gallop through multiple entries all the way through if you need to look at all of them, though you could gallop through the duplicates, if you had a lot of them.
static public int removeDuplicates(int[] list, int size) {
int write = 1;
for (int read = 1; read < size; read++) {
if (list[read] == list[read - 1]) {
continue;
}
list[write++] = list[read];
}
return write;
}
Update: Previous answer, not horrible code but clearly inferior to the above.
Another needless hyper-optimization. It not only invokes arraycopy for the end bits, but also for the beginning. Processing any introductory non-overlap in O(log(n)) by a binarySearch into the data. O(log(n) + n) is O(n) and in some cases the effect will be pretty pronounced especially things like where there is no overlap between the merging arrays at all.
private static int binarySearch(int[] array, int low, int high, int v) {
high = high - 1;
while (low <= high) {
int mid = (low + high) >>> 1;
int midVal = array[mid];
if (midVal > v)
low = mid + 1;
else if (midVal < v)
high = mid - 1;
else
return mid; // key found
}
return low;//traditionally, -(low + 1); // key not found.
}
private static int[] sortedArrayMerge(int a[], int b[]) {
int result[] = new int[a.length + b.length];
int k, i = 0, j = 0;
if (a[0] > b[0]) {
k = i = binarySearch(b, 0, b.length, a[0]);
System.arraycopy(b, 0, result, 0, i);
} else {
k = j = binarySearch(a, 0, a.length, b[0]);
System.arraycopy(a, 0, result, 0, j);
}
while (i < a.length && j < b.length) {
result[k++] = (a[i] < b[j]) ? a[i++] : b[j++];
}
if (j < b.length) {
System.arraycopy(b, j, result, k, (b.length - j));
} else {
System.arraycopy(a, i, result, k, (a.length - i));
}
return result;
}