how would you sort n sorted lists with average length K in O(n*log K) time?
I believe it is possible to achieve O(N*log(K)), but not in the worst case.
Consider sorting these lists:
{0,1,2,3,4,5,6,7,8,9,10},
{10,11,12,13,14,15,16,17,18,19,20},
{20,21,22,23,24,25,26,27,28,29,30}
My human brain can easily sort those lists without reading every value, so there should be an algorithm that can do the same. We need to merge while using a modified binary search to look for ranges of values.
In the worst case, you get O(N*K), because every value must be compared. Example:
{0,2,4,6,8},
{1,3,5,7,9}
Here is my solution in Go, which I would only use if I knew the sorted lists usually have overlap regions that are small relative to K:
// variation of binary search that finds largest
// value up to and including max
func findNext(a []int, imin int, vmax int) int {
imax := len(a) - 1
best := -1
for imin <= imax {
imid := imin + ((imax - imin) / 2)
if a[imid] == vmax {
return imid
} else if a[imid] < vmax {
best = imid
imin = imid + 1
} else {
imax = imid - 1
}
}
return best
}
func sortNSortedLists(in [][]int) []int {
var out []int
cursors := make([]int, len(in))
for {
// Find the array indices that have the smallest
// and next to smallest value (may be same) at
// their current cursor.
minIdx1 := -1
minIdx2 := -1
minVal1 := math.MaxInt32
minVal2 := math.MaxInt32
for i, cursor := range cursors {
if cursor >= len(in[i]) {
continue
}
if in[i][cursor] < minVal1 {
minIdx2 = minIdx1
minVal2 = minVal1
minIdx1 = i
minVal1 = in[i][cursor]
} else if in[i][cursor] < minVal2 {
minIdx2 = i
minVal2 = in[i][cursor]
}
}
if minIdx1 == -1 {
// no values
break
}
if minIdx2 == -1 {
// only one array has values, so append the
// remainder of it to output
out = append(out, in[minIdx1][cursors[minIdx1]:]...)
break
}
// If inVal1 is smaller than inVal2,
// append to output all values from minVal1 to minVal2 found in
// the minIdx1 array, and update the cursor for the minIdx1 array.
if minVal1 < minVal2 {
firstCursor := cursors[minIdx1]
lastCursor := findNext(in[minIdx1], firstCursor, minVal2)
if lastCursor != -1 {
out = append(out, in[minIdx1][firstCursor:lastCursor+1]...)
cursors[minIdx1] = lastCursor+1
continue
}
}
// Append the single value to output
out = append(out, minVal1)
cursors[minIdx1]++
}
return out
}
As mentioned in the comments to your question, the O(nlog(k)) is not possible, but here are a couple of algorithms on this page that accomplish your task efficiently; here is one:
Take the first element of each list and create a heap (of size k). Pop the smallest element. Find the array from the element came (let's say it came from list number i). Take the next element from list i and push it in heap. For each element that go in the merged list, we spent log(k) time. So the time complexity is O(N*logk) where N is total number of the elements in all the K lists.
-written by: Abhishek Goyal
Merge sort is the key. Lets suppose N is the total number of elements to be united and K the number of containers containing them:
You append all the sorted sequences in a single vector but remembering where you appended them. Better is if you append them sorted by the value of their first element, would speed up next passage.
Then you merge in place pairs of sorted sequences (std::inplace_merge if you use C++). Every merge is Na + Nb so every step is N. You have to perform logK steps.
Hence NlogK.
You can adapt merge sort to do the job. Merge sort takes advantage of the ease of merging already sorted lists into a new sorted list.