Why is mean() so slow?

后端 未结 2 514
情话喂你
情话喂你 2020-12-08 14:37

Everything is in the question! I just tried to do a bit of optimization, and nailing down the bottle necks, out of curiosity, I tried that:

t1 <- rnorm(10         


        
相关标签:
2条回答
  • 2020-12-08 14:58

    mean is slower than computing "by hand" for several reasons:

    1. S3 Method dispatch
    2. NA handling
    3. Error correction

    Points 1 and 2 have already been covered. Point 3 is discussed in What algorithm is R using to calculate mean?. Basically, mean makes 2 passes over the vector in order to correct for floating point errors. sum only makes 1 pass over the vector.

    Notice that identical(sum(t1)/length(t1), mean(t1)) may be FALSE, due to these precision issues.

    > set.seed(21); t1 <- rnorm(1e7,,21)
    > identical(sum(t1)/length(t1), mean(t1))
    [1] FALSE
    > sum(t1)/length(t1) - mean(t1)
    [1] 2.539201e-16
    
    0 讨论(0)
  • 2020-12-08 15:04

    It is due to the s3 look up for the method, and then the necessary parsing of arguments in mean.default. (and also the other code in mean)

    sum and length are both Primitive functions. so will be fast (but how are you handling NA values?)

    t1 <- rnorm(10)
    microbenchmark(
      mean(t1),
      sum(t1)/length(t1),
      mean.default(t1),
      .Internal(mean(t1)),
      times = 10000)
    
    Unit: nanoseconds
                    expr   min    lq median    uq     max neval
                mean(t1) 10266 10951  11293 11635 1470714 10000
      sum(t1)/length(t1)   684  1027   1369  1711  104367 10000
        mean.default(t1)  2053  2396   2738  2739 1167195 10000
     .Internal(mean(t1))   342   343    685   685   86574 10000
    

    The internal bit of mean is faster even than sum/length.

    See http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table#method_dispatch_takes_time (mirror) for more details (and a data.table solution that avoids .Internal).

    Note that if we increase the length of the vector, then the primitive approach is fastest

    t1 <- rnorm(1e7)
    microbenchmark(
         mean(t1),
         sum(t1)/length(t1),
         mean.default(t1),
         .Internal(mean(t1)),
    +     times = 100)
    
    Unit: milliseconds
                    expr      min       lq   median       uq      max neval
                mean(t1) 25.79873 26.39242 26.56608 26.85523 33.36137   100
      sum(t1)/length(t1) 15.02399 15.22948 15.31383 15.43239 19.20824   100
        mean.default(t1) 25.69402 26.21466 26.44683 26.84257 33.62896   100
     .Internal(mean(t1)) 25.70497 26.16247 26.39396 26.63982 35.21054   100
    

    Now method dispatch is only a fraction of the overall "time" required.

    0 讨论(0)
提交回复
热议问题