quick/elegant way to construct mean/variance summary table

后端 未结 8 1976
甜味超标
甜味超标 2020-12-13 19:45

I can achieve this task, but I feel like there must be a \"best\" (slickest, most compact, clearest-code, fastest?) way of doing it and have not figured it out so far ...

相关标签:
8条回答
  • 2020-12-13 20:26

    (I voted for Joshua's.) Here's an Hmisc::summary.formula solution. The advantage of this for me is that it is well integrated with the Hmisc::latex output "channel".

    summary(y ~ interaction(f3,f2,f1), data=d, method="response", 
                        fun=function(y) c(mean.y=mean(y) ,var.y=var(y) ))
    #-----output----------
    y    N=108
    
    +-----------------------+-------+---+---------+-----------+
    |                       |       |N  |mean.y   |var.y      |
    +-----------------------+-------+---+---------+-----------+
    |interaction(f3, f2, f1)|I.a.A  |  4|0.6502307|0.095379578|
    |                       |II.a.A |  4|0.4876630|0.110796695|
    

    snipped output to show the latex -> PDF -> png output:

    enter image description here

    0 讨论(0)
  • 2020-12-13 20:30

    I'm slightly addicted to speed comparisons even though they're largely irrelevant for me in this situation ...

    joran_ddply <- function(d) ddply(d,.(f1,f2,f3),
                                     summarise,y.mean = mean(y),y.var = var(y))
    joshulrich_aggregate <- function(d) {
      aggregate(d$y, d[,c("f1","f2","f3")],
                FUN=function(x) c(mean=mean(x),var=var(x)))
    }
    
    formula_aggregate <- function(d) {
      aggregate(y~f1*f2*f3,data=d,
                FUN=function(x) c(mean=mean(x),var=var(x)))
    }
    library(data.table)
    d2 <- data.table(d)
    ramnath_datatable <- function(d) {
      d[,list(avg_y = mean(y), var_y = var(y)), 'f1, f2, f3']
    }
    
    
    library(Hmisc)
    dwin_hmisc <- function(d) {summary(y ~ interaction(f3,f2,f1), 
                       data=d, method="response", 
                       fun=function(y) c(mean.y=mean(y) ,var.y=var(y) ))
                             }
    
    
    library(rbenchmark)
    benchmark(joran_ddply(d),
              joshulrich_aggregate(d),
              ramnath_datatable(d2),
              formula_aggregate(d),
              dwin_hmisc(d))
    

    aggregate is fastest (even faster than data.table, which is a surprise to me, although things might be different with a bigger table to aggregate), even using the formula interface ...)

                         test replications elapsed relative user.self sys.self
    5           dwin_hmisc(d)          100   1.235 2.125645     1.168    0.044
    4    formula_aggregate(d)          100   0.703 1.209983     0.656    0.036
    1          joran_ddply(d)          100   3.345 5.757315     3.152    0.144
    2 joshulrich_aggregate(d)          100   0.581 1.000000     0.596    0.000
    3   ramnath_datatable(d2)          100   0.750 1.290878     0.708    0.000
    

    (Now I just need Dirk to step up and post an Rcpp solution that is 1000 times faster than anything else ...)

    0 讨论(0)
提交回复
热议问题