I have several sorted sequences of numbers of type long (ascending order) and want to generate one master sequence that contains all elements in the same order. I look for the most efficient sorting algorithm to solve this problem. I target C#, .Net 4.0 and thus also welcome ideas targeting parallelism.
Here is an example:
s1 = 1,2,3,5,7,13
s2 = 2,3,6
s3 = 4,5,6,7,8
resulting Sequence = 1,2,2,3,3,4,5,5,6,6,7,7,8,13
Edit: When there are two (or more) identical values then the order of those two (or more) does not matter.
Turns out that with all the algorithms... It's still faster the simple way:
private static List<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> sortedBunches)
var list = sortedBunches.SelectMany(bunch => bunch).ToList();
return list;
And for legacy purposes...
Here is the final version by prioritizing:
private static IEnumerable<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> sortedInts) where T : IComparable<T>
var enumerators = new List<IEnumerator<T>>(sortedInts.Select(ints => ints.GetEnumerator()).Where(e => e.MoveNext()));
enumerators.Sort((e1, e2) => e1.Current.CompareTo(e2.Current));
while (enumerators.Count > 1)
yield return enumerators[0].Current;
if (enumerators[0].MoveNext())
if (enumerators[0].Current.CompareTo(enumerators[1].Current) == 1)
var tmp = enumerators[0];
enumerators[0] = enumerators[1];
enumerators[1] = tmp;
yield return enumerators[0].Current;
} while (enumerators[0].MoveNext());
Just merge the sequences. You do not have to sort them again.
There is no .NET Framework method that I know of to do a K-way merge. Typically, it's done with a priority queue (often a heap). It's not difficult to do, and it's quite efficient. Given K sorted lists, together holding N items, the complexity is O(N log K).
I show a simple binary heap class in my article A Generic Binary Heap Class. In Sorting a Large Text File, I walk through the creation of multiple sorted sub-files and using the heap to do the K-way merge. Given an hour (perhaps less) of study, and you can probably adapt that to use in your program.
You just have to merge your sequences like in a merge sort.
And this is parallelizable:
- merge sequences (1 and 2 in 1/2), (3 and 4 in 3/4), …
- merge sequences (1/2 and 3/4 in 1/2/3/4), (5/6 and 7/8 in 5/6/7/8), …
- …
Here is the merge function :
int j = 0;
int k = 0;
for(int i = 0; i < size_merged_seq; i++)
if (j < size_seq1 && seq1[j] < seq2[k])
merged_seq[i] = seq1[j];
merged_seq[i] = seq2[k];
Easy way is to merge them with each other one by one. However, this will require O(n*k^2)
time, where k
is number of sequences and n
is the average number of items in sequences. However, using divide and conquer approach you can lower this time to O(n*k*log k). The algorithm is as follows:
- Divide k sequences to k/2 groups, each of 2 elements (and 1 groups of 1 element if k is odd).
- Merge sequences in each group. Thus you will get k/2 new groups.
- Repeat until you get single sequence.