C# merge sort performance

核能气质少年 提交于 2019-12-12 08:29:24

问题


just a quick note, this is not homework. I'm just trying to brush up on my algorithms. I'm playing around with MergeSort in C# and I've written a recursive method that can sort based on Generics:

class SortAlgorithms
{

    public T[] MergeSort<T> (T[] unsortedArray) where T : System.IComparable<T>
    {
        T[] left, right;
        int middle = unsortedArray.Length / 2;

        left = new T[middle];
        right = new T[unsortedArray.Length - middle];

        if (unsortedArray.Length <= 1)
            return unsortedArray;

        for (int i = 0; i < middle; i++)
        {
            left[i] = unsortedArray[i];
        }

        for (int i = middle; i < unsortedArray.Length; i++)
        {
            right[i - middle] = unsortedArray[i];
        }

        left = MergeSort(left);

        right = MergeSort(right);


        return Merge<T>(left, right);
    }

    private T[] Merge<T> (T[] left, T[] right) where T : System.IComparable<T>
    {
        T[] result = new T[left.Length + right.Length];

        int currentElement = 0;

        while (left.Length > 0 || right.Length > 0)
        {
            if (left.Length > 0 && right.Length > 0)
            {
                if (left[0].CompareTo(right[0]) < 0)
                {
                    result[currentElement] = left[0];
                    left = left.Skip(1).ToArray();
                    currentElement++;
                }
                else
                {
                    result[currentElement] = right[0];
                    right = right.Skip(1).ToArray();
                    currentElement++;
                }
            }
            else if (left.Length > 0)
            {
                result[currentElement] = left[0];
                left = left.Skip(1).ToArray();
                currentElement++;
            }
            else if (right.Length > 0)
            {
                result[currentElement] = right[0];
                right = right.Skip(1).ToArray();
                currentElement++;
            }
        }

        return result;
    }
}

This works but it is painfully slow. I've used System.Diagnostic.StopWatch to check performance against Array.Sort (which uses QuickSort algorithm) to compare against my MergeSort and the difference is so significant I'm wondering if maybe I'm implementing this wrong. Any comments?


回答1:


I am not a C# programmer, but could the problem be the use of statements like this one?

left = left.Skip(1).ToArray();

This might be implemented in a way that forces a deep copy of the underlying array. If so, this would drop the performance of merge from O(n) to O(n2), immediately dropping the performance of the resulting merge sort from O(n log n) to O(n2).

(This is because the recurrence changes from

T(1) = O(1)

T(n) ≤ 2T(n / 2) + O(n)

which has solution T(n) = O(n log n), to

T(1) = O(1)

T(n) ≤ 2T(n / 2) + O(n2)

which has solution T(n) = O(n2).)




回答2:


You are constantly allocating memory in the form of intermediate arrays. Think in the direction of reusing the original array.




回答3:


As the other two answers have said, you're creating new arrays all over the place, spending lots of time and memory on that (I'd guess, most of your time and almost all of your memory use).

Onto that again, I'd add that all else being equal recursion tends to be slower than iteration, and use more stack space (perhaps even causing overflow with a big enough problem, where iteration would not).

However. Merge-sort lends itself well to multi-threaded approach, because you can have different threads handle different parts of first batch of partitioning.

Hence, if it were I playing with this, my next two experiments would be:

  1. For the first bit of the partitioning, instead of calling MergeSort recursively, I'd launch a new thread until such a time as I had a thread per core running (whether I should do it per physical core or virtual core in the case of hyperthreading, is itself something I'd experiment with).
  2. That done, I'd try re-writing the recursive method to do the same thing without recursive calls.

After the ToArray() matter was dealt with, seeing how a multi-threaded approach that first split the work among an optimal number of cores, and then had each core do its work iteratively, could be quite interesting indeed.




回答4:


First off, here's a link to a streamlined solution to a similar question: Java mergesort, should the "merge" step be done with queues or arrays?

Your solution is slow because you're repeatedly allocating new subarrays. Memory allocation is way more expensive than most other operations (you have allocation cost, collection cost, and loss of cache locality). Normally it isn't an issue, but if you're trying to code a tight sorting routine, then it matters. For merge sort, you only require one destination array and one temporary array.

Forking threads to parallelise is still orders of magnitude more expensive than that. So don't fork unless you have a massive amount of data to sort.

As I mention in the answer above, one way to speed up your merge sort is to take advantage of existing order in the input array.



来源:https://stackoverflow.com/questions/12148712/c-sharp-merge-sort-performance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!