F#: removing duplicates from a seq is slow

后端 未结 9 1085
半阙折子戏
半阙折子戏 2021-01-12 03:11

I am attempting to write a function that weeds out consecutive duplicates, as determined by a given equality function, from a seq<\'a> but with a twist:

9条回答
  •  终归单人心
    2021-01-12 04:10

    To make efficient use of the input type Seq, one should iterate through each element only once and avoid creating additional sequences.

    On the other side, to make efficient use of the output type List, one should make liberal use of the cons and tail functions, both of which are basically free.

    Combining the two requirements leads me to this solution:

    // dedupeTakingLast2 : ('a -> 'a -> bool) -> seq<'a> -> 'a list
    let dedupeTakingLast2 equalityFn = 
      Seq.fold 
      <| fun deduped elem ->     
           match deduped with
           | [] -> [ elem ]
           | x :: xs -> if equalityFn x elem 
                          then elem :: xs
                          else elem :: deduped
      <| []
    

    Note however, that the outputted list will be in reverse order, due to list prepending. I hope this isn't a dealbreaker, since List.rev is a relatively expensive operation.

    Test:

    List.init 1000 (id) 
    |> dedupeTakingLast2 (fun x y -> x - (x % 10) = y - (y % 10))
    |> List.iter (printfn "%i ")
    
    // 999 989 979 969 etc...
    

提交回复
热议问题