Why does Haskell use mergesort instead of quicksort?

前端 未结 6 771
执念已碎
执念已碎 2021-01-31 01:38

In Wikibooks\' Haskell, there is the following claim:

Data.List offers a sort function for sorting lists. It does not use quicksort; rather, it uses an ef

6条回答
  •  一生所求
    2021-01-31 01:54

    I think @comingstorm's answer is pretty much on the nose, but here's some more info on the history of GHC's sort function.

    In the source code for Data.OldList, you can find the implementation of sort and verify for yourself that it's a merge sort. Just below the definition in that file is the following comment:

    Quicksort replaced by mergesort, 14/5/2002.
    
    From: Ian Lynagh 
    
    I am curious as to why the List.sort implementation in GHC is a
    quicksort algorithm rather than an algorithm that guarantees n log n
    time in the worst case? I have attached a mergesort implementation along
    with a few scripts to time it's performance...
    

    So, originally a functional quicksort was used (and the function qsort is still there, but commented out). Ian's benchmarks showed that his mergesort was competitive with quicksort in the "random list" case and massively outperformed it in the case of already sorted data. Later, Ian's version was replaced by another implementation that was about twice as fast, according to additional comments in that file.

    The main issue with the original qsort was that it didn't use a random pivot. Instead it pivoted on the first value in the list. This is obviously pretty bad because it implies performance will be worst case (or close) for sorted (or nearly sorted) input. Unfortunately, there are a couple of challenges in switching from "pivot on first" to an alternative (either random, or -- as in your implementation -- somewhere in "the middle"). In a functional language without side effects, managing a pseudorandom input is a bit of a problem, but let's say you solve that (maybe by building a random number generator into your sort function). You still have the problem that, when sorting an immutable linked list, locating an arbitrary pivot and then partitioning based on it will involve multiple list traversals and sublist copies.

    I think the only way to realize the supposed benefits of quicksort would be to write the list out to a vector, sort it in place (and sacrifice sort stability), and write it back out to a list. I don't see that that could ever be an overall win. On the other hand, if you already have data in a vector, then an in-place quicksort would definitely be a reasonable option.

提交回复
热议问题