What is the fastest way to compute the number of occurrences for each unique element in a vector in R?
So far, I\'ve tried the following five functions:
There's almost nothing that will beat tabulate()
provided you can meet the initial conditions.
x <- sample(1:100, size=1e7, TRUE)
system.time(tabulate(x))
# user system elapsed
# 0.071 0.000 0.072
@dickoa adds a few more notes in the comments as to how to get the appropriate output, but tabulate as a workhorse function is the way to go.
This is a little slower than tabulate
, but is more universal (it will work with characters, factors, basically whatever you throw at it) and much easier to read/maintain/expand.
library(data.table)
f6 = function(x) {
data.table(x)[, .N, keyby = x]
}
x <- sample(1:1000, size=1e7, TRUE)
system.time(f6(x))
# user system elapsed
# 0.80 0.07 0.86
system.time(f8(x)) # tabulate + dickoa's conversion to data.frame
# user system elapsed
# 0.56 0.04 0.60
UPDATE: As of data.table
version 1.9.3, the data.table
version is actually about 2x faster than tabulate
+ data.frame
conversion.