F#: removing duplicates from a seq is slow

后端未结

关注

 9  1084

I am attempting to write a function that weeds out consecutive duplicates, as determined by a given equality function, from a seq<\'a> but with a twist:

相关标签:

9条回答

北荒

2021-01-12 04:09

Here is a pretty fast approach which uses library functions rather than Seq expressions.

Your test runs in 0.007 seconds on my PC.

It has a pretty nasty hack for the first element that doesn't work brilliantly that could be improved.

let rec dedupe equalityfn prev (s:'a seq) : 'a seq =
    if Seq.isEmpty s then
        Seq.empty
    else
        let rest = Seq.skipWhile (equalityfn prev) s
        let valid = Seq.takeWhile (equalityfn prev) s
        let valid2 = if Seq.isEmpty valid  then Seq.singleton prev else (Seq.last valid) |> Seq.singleton
        let filtered = if Seq.isEmpty rest then Seq.empty else dedupe equalityfn (Seq.head rest) (rest)
        Seq.append valid2 filtered

let t = [("a", 1); ("b", 2); ("b", 3); ("b", 4); ("c", 5)]
        |> dedupe (fun (x1, y1) (x2, y2) -> x1=x2) ("asdfasdf",1)
        |> List.ofSeq;;

#time
List.init 1000 (fun _ -> 1)
|> dedupe (fun x y -> x = y) (189234784)
|> List.ofSeq
#time;;
--> Timing now on

Real: 00:00:00.007, CPU: 00:00:00.006, GC gen0: 0, gen1: 0
val it : int list = [189234784; 1]

--> Timing now off

0 讨论(0)

庸人自扰

2021-01-12 04:10

As the other answers have said, seq are really slow. However, the real question is why do you want to use a seq here? In particular, you start with a list and you want to traverse the entire list and you want to create a new list at the end. There doesn't appear to be any reason to use a sequence at all unless you want to use sequence specific features. In fact, the docs state that (emphasis mine):

A sequence is a logical series of elements all of one type. Sequences are particularly useful when you have a large, ordered collection of data but do not necessarily expect to use all the elements. Individual sequence elements are computed only as required, so a sequence can provide better performance than a list in situations in which not all the elements are used.

0 讨论(0)
发布评论:

提交评论
- 加载中...
终归单人心

2021-01-12 04:10
To make efficient use of the input type Seq, one should iterate through each element only once and avoid creating additional sequences.

On the other side, to make efficient use of the output type List, one should make liberal use of the cons and tail functions, both of which are basically free.

Combining the two requirements leads me to this solution:
```
// dedupeTakingLast2 : ('a -> 'a -> bool) -> seq<'a> -> 'a list
let dedupeTakingLast2 equalityFn = 
  Seq.fold 
  <| fun deduped elem ->     
       match deduped with
       | [] -> [ elem ]
       | x :: xs -> if equalityFn x elem 
                      then elem :: xs
                      else elem :: deduped
  <| []
```
Note however, that the outputted list will be in reverse order, due to list prepending. I hope this isn't a dealbreaker, since List.rev is a relatively expensive operation.

Test:
```
List.init 1000 (id) 
|> dedupeTakingLast2 (fun x y -> x - (x % 10) = y - (y % 10))
|> List.iter (printfn "%i ")

// 999 989 979 969 etc...
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2