I know how I can do all that for individual variables but I need to report this information for a large number of variables and would like to know if there is an efficient w
In a data object like that offered by Alexander:
aggregate( . ~ Group, FUN=function(x) c(mn=mean(x), sd=sd(x)), data=Data[-1])
# Output
Group V1.mn V1.sd V2.mn V2.sd
1 1 0.05336901 0.85468837 0.06833691 0.94459083
2 2 -0.01658412 0.97583110 -0.02940477 1.11880398
V3.mn V3.sd V4.mn V4.sd
1 -0.2096497 1.1732246 0.08850199 0.98906102
2 0.0674267 0.8848818 -0.11485148 0.90554914
The data argument omits the ID column because you only want the results on the data columns. The request for a collection of p-values can be accomplished with:
sapply(names(Data)[-(1:2)], function(x) c(
Mean.Grp1 = mean(Data[Data$Group==1,x]),
Mean.Grp2 = mean(Data[Data$Group==2,x]),
`p-value`= t.test(Data[Data$Group==1, x],
Data[Data$Group==2,x])$p.value )
)
#---------------------------
V1 V2 V3 V4
Mean.Grp1 0.05336901 0.06833691 -0.2096497 0.08850199
Mean.Grp2 -0.01658412 -0.02940477 0.0674267 -0.11485148
p-value 0.70380932 0.63799544 0.1857743 0.28624585
If you wanted to add the SD's to that output the strategy seems obvious. You should note the back-quoting of the "p-value" name. Minus signs are syntactically "active" and would get interpreted as functions if not enclosed in quotes.