Most efficient sorting algorithm for sorted sub-sequences

你离开我真会死。 提交于 2019-12-11 04:57:33

问题


I have several sorted sequences of numbers of type long (ascending order) and want to generate one master sequence that contains all elements in the same order. I look for the most efficient sorting algorithm to solve this problem. I target C#, .Net 4.0 and thus also welcome ideas targeting parallelism.

Here is an example:
s1 = 1,2,3,5,7,13
s2 = 2,3,6
s3 = 4,5,6,7,8
resulting Sequence = 1,2,2,3,3,4,5,5,6,6,7,7,8,13

Edit: When there are two (or more) identical values then the order of those two (or more) does not matter.


回答1:


UPDATE:

Turns out that with all the algorithms... It's still faster the simple way:

private static List<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> sortedBunches)
{
    var list = sortedBunches.SelectMany(bunch => bunch).ToList();

    list.Sort();

    return list;
}

And for legacy purposes...

Here is the final version by prioritizing:

    private static IEnumerable<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> sortedInts) where T : IComparable<T>
    {
        var enumerators = new List<IEnumerator<T>>(sortedInts.Select(ints => ints.GetEnumerator()).Where(e => e.MoveNext()));

        enumerators.Sort((e1, e2) => e1.Current.CompareTo(e2.Current));

        while (enumerators.Count > 1)
        {
            yield return enumerators[0].Current;

            if (enumerators[0].MoveNext())
            {
                if (enumerators[0].Current.CompareTo(enumerators[1].Current) == 1)
                {
                    var tmp = enumerators[0];
                    enumerators[0] = enumerators[1];
                    enumerators[1] = tmp;
                }
            }
            else
            {
                enumerators.RemoveAt(0);
            }
        }

        do
        {
            yield return enumerators[0].Current;
        } while (enumerators[0].MoveNext());
    }



回答2:


Just merge the sequences. You do not have to sort them again.




回答3:


There is no .NET Framework method that I know of to do a K-way merge. Typically, it's done with a priority queue (often a heap). It's not difficult to do, and it's quite efficient. Given K sorted lists, together holding N items, the complexity is O(N log K).

I show a simple binary heap class in my article A Generic Binary Heap Class. In Sorting a Large Text File, I walk through the creation of multiple sorted sub-files and using the heap to do the K-way merge. Given an hour (perhaps less) of study, and you can probably adapt that to use in your program.




回答4:


You just have to merge your sequences like in a merge sort.

And this is parallelizable:

  1. merge sequences (1 and 2 in 1/2), (3 and 4 in 3/4), …
  2. merge sequences (1/2 and 3/4 in 1/2/3/4), (5/6 and 7/8 in 5/6/7/8), …

Here is the merge function :

int j = 0;
int k = 0;
for(int i = 0; i < size_merged_seq; i++)
{
  if (j < size_seq1 && seq1[j] < seq2[k])
  {
    merged_seq[i] = seq1[j];
    j++;
  }
  else
  {
    merged_seq[i] = seq2[k];
    k++;
  }
}



回答5:


Easy way is to merge them with each other one by one. However, this will require O(n*k^2) time, where k is number of sequences and n is the average number of items in sequences. However, using divide and conquer approach you can lower this time to O(n*k*log k). The algorithm is as follows:

  1. Divide k sequences to k/2 groups, each of 2 elements (and 1 groups of 1 element if k is odd).
  2. Merge sequences in each group. Thus you will get k/2 new groups.
  3. Repeat until you get single sequence.


来源:https://stackoverflow.com/questions/10450138/most-efficient-sorting-algorithm-for-sorted-sub-sequences

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!