How can I remove duplicates in an F# sequence without using references

前端 未结 6 407
南方客
南方客 2021-01-20 09:07

I have a sorted sequence and want to go through it and return the unique entries in the sequence. I can do it using the following function, but it uses reference variables a

6条回答
  •  臣服心动
    2021-01-20 09:42

    distinct and distinctBy both use Dictionary and therefore require hashing and a bit of memory for storing unique items. If your sequence is already sorted, you can use the following approach (similar to yours). It's nearly twice as fast and has constant memory use, making it usable for sequences of any size.

    let distinctWithoutHash (items:seq<_>) =
      seq {
        use e = items.GetEnumerator()
        if e.MoveNext() then
          let prev = ref e.Current
          yield !prev
          while e.MoveNext() do
            if e.Current <> !prev then 
              yield e.Current
              prev := e.Current
      }
    
    let items = Seq.init 1000000 (fun i -> i / 2)
    let test f = items |> f |> (Seq.length >> printfn "%d")
    
    test Seq.distinct        //Real: 00:00:01.038, CPU: 00:00:01.435, GC gen0: 47, gen1: 1, gen2: 1
    test distinctWithoutHash //Real: 00:00:00.622, CPU: 00:00:00.624, GC gen0: 44, gen1: 0, gen2: 0
    

    I couldn't figure out a way to use mutables instead of refs (short of hand-coding an enumerator), which I'm sure would speed it up considerably (I tried it--it makes no difference).

提交回复
热议问题