finding unique vector elements in a list efficiently

后端未结

关注

 2  2201

心在旅途 2021-02-15 08:09

I have a list of numerical vectors, and I need to create a list containing only one copy of each vector. There isn\'t a list method for the identical function, so I wrote a func

2条回答

我在风中等你 (楼主)

2021-02-15 08:56

As per @JoshuaUlrich and @thelatemail, ll[!duplicated(ll)] works just fine.
And thus, so should unique(ll) I previously suggested a method using sapply with the idea of not checking every element in the list (I deleted that answer, as I think using unique makes more sense)

Since efficiency is a goal, we should benchmark these.

# Let's create some sample data
xx <- lapply(rep(100,15), sample)
ll <- as.list(sample(xx, 1000, T))
ll

Putting it up against some becnhmarks

fun1 <- function(ll) {
  ll[c(TRUE, !sapply(2:length(ll), function(i) ll[i] %in% ll[1:(i-1)]))]
}

fun2 <- function(ll) {
  ll[!duplicated(sapply(ll, digest))]
}

fun3 <- function(ll)  {
  ll[!duplicated(ll)]
}

fun4 <- function(ll)  {
  unique(ll)
}

#Make sure all the same
all(identical(fun1(ll), fun2(ll)), identical(fun2(ll), fun3(ll)), 
    identical(fun3(ll), fun4(ll)), identical(fun4(ll), fun1(ll)))
# [1] TRUE


library(rbenchmark)

benchmark(digest=fun2(ll), duplicated=fun3(ll), unique=fun4(ll), replications=100, order="relative")[, c(1, 3:6)]

        test elapsed relative user.self sys.self
3     unique   0.048    1.000     0.049    0.000
2 duplicated   0.050    1.042     0.050    0.000
1     digest   8.427  175.563     8.415    0.038
# I took out fun1, since when ll is large, it ran extremely slow

Fastest Option:

unique(ll)

0 讨论(0)

查看其它2个回答