Why is using dplyr pipe (%>%) slower than an equivalent non-pipe expression, for high-cardinality group-by?

后端 未结 4 1011
孤城傲影
孤城傲影 2020-12-24 14:20

I thought that generally speaking using %>% wouldn\'t have a noticeable effect on speed. But in this case it runs 4x slower.

library(dplyr         


        
4条回答
  •  生来不讨喜
    2020-12-24 14:53

    But here is something I have learnt today. I am using R 3.5.0.

    Code with x = 100 (1e2)

    library(microbenchmark)
    library(dplyr)
    
    set.seed(99)
    x <- 1e2
    z <- sample(x, x / 2, TRUE)
    timings <- microbenchmark(
      dp = z %>% unique %>% list, 
      bs = list(unique(z)))
    
    print(timings)
    
    Unit: microseconds
     expr    min      lq      mean   median       uq     max neval
       dp 99.055 101.025 112.84144 102.7890 109.2165 312.359   100
       bs  6.590   7.653   9.94989   8.1625   8.9850  63.790   100
    

    Although, if x = 1e6

    Unit: milliseconds
     expr      min       lq     mean   median       uq      max neval
       dp 27.77045 31.78353 35.09774 33.89216 38.26898  52.8760   100
       bs 27.85490 31.70471 36.55641 34.75976 39.12192 138.7977   100
    

提交回复
热议问题