Why is F#'s Seq.sortBy much slower than LINQ's IEnumerable.OrderBy extension method?

前端 未结 2 755
清酒与你
清酒与你 2021-02-10 12:34

I\'ve recently written a piece of code to read some data from a file, store it in a tuple and sort all the collected data by the first element of the tuple. After some tests I\'

2条回答
  •  有刺的猬
    2021-02-10 13:19

    the f# implementation uses a structural comparison of the resulting key.

    let sortBy keyf seq =
        let comparer = ComparisonIdentity.Structural
        mkDelayedSeq (fun () -> 
            (seq 
            |> to_list 
            |> List.sortWith (fun x y -> comparer.Compare(keyf x,keyf y)) 
            |> to_array) :> seq<_>)
    

    (also sort)

    let sort seq =
        mkDelayedSeq (fun () -> 
            (seq 
            |> to_list 
            |> List.sortWith Operators.compare 
            |> to_array) :> seq<_>)
    

    both Operators.compare and the ComparisonIdentity.Structural.Compare become (eventually)

    let inline GenericComparisonFast<'T> (x:'T) (y:'T) : int = 
        GenericComparisonIntrinsic x y
            // lots of other types elided
            when 'T : float = if (# "clt" x y : bool #) 
                              then (-1) 
                              else (# "cgt" x y : int #)
    

    but the route to this for the Operator is entirely inline, thus the JIT compiler will end up inserting a direct double comparison instruction with no additional method invocation overhead except for the (required in both cases anyway) delegate invocation.

    The sortBy uses a comparer so will go through an additional virtual method call but is basically about the same.

    In comparison the OrderBy function also must go through virtual method calls for the equality (Using EqualityComparer.Default) but the significant difference is that it sorts in place and uses the buffer created for this as the result. In comparison if you take a look at the sortBy you will see that it sorts the list (not in place, it uses the StableSortImplementation which appears to be merge sort) and then creates a copy of it as a new array. This additional copy (given the size of your input data) is likely the principle cause of the slow down though the differing sort implementations may also have an effect.

    That said this is all guessing. If this area is a concern for you in performance terms then you should simply profile to find out what is taking the time.

    If you wish to see what effect the sorting/copying change would have try this alternate:

    // these are taken from the f# source so as to be consistent
    // beware doing this, the compiler may know about such methods
    open System.Collections.Generic
    let mkSeq f = 
        { new IEnumerable<'b> with 
            member x.GetEnumerator() = f()
          interface System.Collections.IEnumerable with 
            member x.GetEnumerator() = (f() :> System.Collections.IEnumerator) }
    
    let mkDelayedSeq (f: unit -> IEnumerable<'T>) = 
        mkSeq (fun () -> f().GetEnumerator())
    
    // the function
    let sortByFaster keyf seq =
        let comparer = ComparisonIdentity.Structural
        mkDelayedSeq (fun () -> 
            let buffer = Seq.to_array seq
            Array.sortInPlaceBy (fun x y -> comparer.Compare(keyf x,keyf y)) buffer
            buffer :> seq<_>)
    

    I get some reasonable percentage speedups within the repl with very large (> million) input sequences but nothing like an order of magnitude. Your mileage, as always, may vary.

提交回复
热议问题