Fastest way to count occurrences of each unique element

前端 未结 2 706
醉酒成梦
醉酒成梦 2020-12-05 19:45

What is the fastest way to compute the number of occurrences for each unique element in a vector in R?

So far, I\'ve tried the following five functions:



        
相关标签:
2条回答
  • 2020-12-05 20:02

    There's almost nothing that will beat tabulate() provided you can meet the initial conditions.

    x <- sample(1:100, size=1e7, TRUE)
    system.time(tabulate(x))
    #  user  system elapsed 
    # 0.071   0.000   0.072 
    

    @dickoa adds a few more notes in the comments as to how to get the appropriate output, but tabulate as a workhorse function is the way to go.

    0 讨论(0)
  • 2020-12-05 20:11

    This is a little slower than tabulate, but is more universal (it will work with characters, factors, basically whatever you throw at it) and much easier to read/maintain/expand.

    library(data.table)
    
    f6 = function(x) {
      data.table(x)[, .N, keyby = x]
    }
    
    x <- sample(1:1000, size=1e7, TRUE)
    system.time(f6(x))
    #   user  system elapsed 
    #   0.80    0.07    0.86 
    
    system.time(f8(x)) # tabulate + dickoa's conversion to data.frame
    #   user  system elapsed 
    #   0.56    0.04    0.60 
    

    UPDATE: As of data.table version 1.9.3, the data.table version is actually about 2x faster than tabulate + data.frame conversion.

    0 讨论(0)
提交回复
热议问题