I am attempting to write a function that weeds out consecutive duplicates, as determined by a given equality function, from a seq<\'a>
but with a twist:
Here is a pretty fast approach which uses library functions rather than Seq expressions.
Your test runs in 0.007 seconds on my PC.
It has a pretty nasty hack for the first element that doesn't work brilliantly that could be improved.
let rec dedupe equalityfn prev (s:'a seq) : 'a seq =
if Seq.isEmpty s then
Seq.empty
else
let rest = Seq.skipWhile (equalityfn prev) s
let valid = Seq.takeWhile (equalityfn prev) s
let valid2 = if Seq.isEmpty valid then Seq.singleton prev else (Seq.last valid) |> Seq.singleton
let filtered = if Seq.isEmpty rest then Seq.empty else dedupe equalityfn (Seq.head rest) (rest)
Seq.append valid2 filtered
let t = [("a", 1); ("b", 2); ("b", 3); ("b", 4); ("c", 5)]
|> dedupe (fun (x1, y1) (x2, y2) -> x1=x2) ("asdfasdf",1)
|> List.ofSeq;;
#time
List.init 1000 (fun _ -> 1)
|> dedupe (fun x y -> x = y) (189234784)
|> List.ofSeq
#time;;
--> Timing now on
Real: 00:00:00.007, CPU: 00:00:00.006, GC gen0: 0, gen1: 0
val it : int list = [189234784; 1]
--> Timing now off
As the other answers have said, seq
are really slow. However, the real question is why do you want to use a seq
here? In particular, you start with a list and you want to traverse the entire list and you want to create a new list at the end. There doesn't appear to be any reason to use a sequence at all unless you want to use sequence specific features. In fact, the docs state that (emphasis mine):
A sequence is a logical series of elements all of one type. Sequences are particularly useful when you have a large, ordered collection of data but do not necessarily expect to use all the elements. Individual sequence elements are computed only as required, so a sequence can provide better performance than a list in situations in which not all the elements are used.
To make efficient use of the input type Seq
, one should iterate through each element only once and avoid creating additional sequences.
On the other side, to make efficient use of the output type List
, one should make liberal use of the cons
and tail
functions, both of which are basically free.
Combining the two requirements leads me to this solution:
// dedupeTakingLast2 : ('a -> 'a -> bool) -> seq<'a> -> 'a list
let dedupeTakingLast2 equalityFn =
Seq.fold
<| fun deduped elem ->
match deduped with
| [] -> [ elem ]
| x :: xs -> if equalityFn x elem
then elem :: xs
else elem :: deduped
<| []
Note however, that the outputted list will be in reverse order, due to list prepending. I hope this isn't a dealbreaker, since List.rev
is a relatively expensive operation.
Test:
List.init 1000 (id)
|> dedupeTakingLast2 (fun x y -> x - (x % 10) = y - (y % 10))
|> List.iter (printfn "%i ")
// 999 989 979 969 etc...