I have a sorted sequence and want to go through it and return the unique entries in the sequence. I can do it using the following function, but it uses reference variables a
distinct
and distinctBy
both use Dictionary
and therefore require hashing and a bit of memory for storing unique items. If your sequence is already sorted, you can use the following approach (similar to yours). It's nearly twice as fast and has constant memory use, making it usable for sequences of any size.
let distinctWithoutHash (items:seq<_>) =
seq {
use e = items.GetEnumerator()
if e.MoveNext() then
let prev = ref e.Current
yield !prev
while e.MoveNext() do
if e.Current <> !prev then
yield e.Current
prev := e.Current
}
let items = Seq.init 1000000 (fun i -> i / 2)
let test f = items |> f |> (Seq.length >> printfn "%d")
test Seq.distinct //Real: 00:00:01.038, CPU: 00:00:01.435, GC gen0: 47, gen1: 1, gen2: 1
test distinctWithoutHash //Real: 00:00:00.622, CPU: 00:00:00.624, GC gen0: 44, gen1: 0, gen2: 0
I couldn't figure out a way to use mutable
s instead of ref
s (short of hand-coding an enumerator), which I'm sure would speed it up considerably (I tried it--it makes no difference).