Why is F#'s Seq.sortBy much slower than LINQ's IEnumerable.OrderBy extension method?

前端未结

关注

 2  755

清酒与你 2021-02-10 12:34

I\'ve recently written a piece of code to read some data from a file, store it in a tuple and sort all the collected data by the first element of the tuple. After some tests I\'

2条回答

有刺的猬 (楼主)

2021-02-10 13:19
the f# implementation uses a structural comparison of the resulting key.
```
let sortBy keyf seq =
    let comparer = ComparisonIdentity.Structural
    mkDelayedSeq (fun () -> 
        (seq 
        |> to_list 
        |> List.sortWith (fun x y -> comparer.Compare(keyf x,keyf y)) 
        |> to_array) :> seq<_>)
```
(also sort)
```
let sort seq =
    mkDelayedSeq (fun () -> 
        (seq 
        |> to_list 
        |> List.sortWith Operators.compare 
        |> to_array) :> seq<_>)
```
both Operators.compare and the ComparisonIdentity.Structural.Compare become (eventually)
```
let inline GenericComparisonFast<'T> (x:'T) (y:'T) : int = 
    GenericComparisonIntrinsic x y
        // lots of other types elided
        when 'T : float = if (# "clt" x y : bool #) 
                          then (-1) 
                          else (# "cgt" x y : int #)
```
but the route to this for the Operator is entirely inline, thus the JIT compiler will end up inserting a direct double comparison instruction with no additional method invocation overhead except for the (required in both cases anyway) delegate invocation.

The sortBy uses a comparer so will go through an additional virtual method call but is basically about the same.

In comparison the OrderBy function also must go through virtual method calls for the equality (Using EqualityComparer.Default) but the significant difference is that it sorts in place and uses the buffer created for this as the result. In comparison if you take a look at the sortBy you will see that it sorts the list (not in place, it uses the StableSortImplementation which appears to be merge sort) and then creates a copy of it as a new array. This additional copy (given the size of your input data) is likely the principle cause of the slow down though the differing sort implementations may also have an effect.

That said this is all guessing. If this area is a concern for you in performance terms then you should simply profile to find out what is taking the time.

If you wish to see what effect the sorting/copying change would have try this alternate:
```
// these are taken from the f# source so as to be consistent
// beware doing this, the compiler may know about such methods
open System.Collections.Generic
let mkSeq f = 
    { new IEnumerable<'b> with 
        member x.GetEnumerator() = f()
      interface System.Collections.IEnumerable with 
        member x.GetEnumerator() = (f() :> System.Collections.IEnumerator) }

let mkDelayedSeq (f: unit -> IEnumerable<'T>) = 
    mkSeq (fun () -> f().GetEnumerator())

// the function
let sortByFaster keyf seq =
    let comparer = ComparisonIdentity.Structural
    mkDelayedSeq (fun () -> 
        let buffer = Seq.to_array seq
        Array.sortInPlaceBy (fun x y -> comparer.Compare(keyf x,keyf y)) buffer
        buffer :> seq<_>)
```
I get some reasonable percentage speedups within the repl with very large (> million) input sequences but nothing like an order of magnitude. Your mileage, as always, may vary.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...