问题
I am trying to run a two-way ANOVA on multiple subsets of a data frame without having to actually subset the data as this is in-efficient
Example data:
DF<-structure(list(Sample = c(666L, 676L, 686L, 667L, 677L, 687L,
822L, 832L, 842L, 824L, 834L, 844L), Time = c(300L, 300L, 300L,
300L, 300L, 300L, 400L, 400L, 400L, 400L, 400L, 400L), Ploidy = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2n",
"3n"), class = "factor"), Tissue = c("muscle", "muscle", "muscle",
"liver", "liver", "liver", "intestine", "intestine", "intestine",
"gill", "gill", "gill"), X.lipid = c(1.1, 0.8, 1.3, 3.7, 3.9,
3.8, 5.2, 3.4, 6, 7.6, 10.4, 6.7), l.dec = c(0.011, 0.008, 0.013,
0.037, 0.039, 0.038, 0.052, 0.034, 0.06, 0.076, 0.104, 0.067),
l.arc = c(0.105074124512229, 0.0895624074394449, 0.114266036973812,
0.193560218793138, 0.19879088899975, 0.196192082631721, 0.230059118691331,
0.185452088760136, 0.247467063170448, 0.279298057669285,
0.328359182374352, 0.261824790465914)), .Names = c("Sample",
"Time", "Ploidy", "Tissue", "X.lipid", "l.dec", "l.arc"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 69L, 70L, 71L, 72L, 73L, 74L), class = "data.frame")
Coming across similar examples: Anova, for loop to apply function and ANOVA on multiple responses, by multiple groups NOT part of formula
I can get close but I do not believe this is correct as it uses aov, as opposed to anova
x<- unique(DF$Tissue)
sapply(x, function(my) {
f <- as.formula(paste("l.dec~Time*Ploidy"))
aov(f, data=DF)
}, simplify=FALSE)
If i switch aov for anova, it returns an error message:
Error in UseMethod("anova") :
no applicable method for 'anova' applied to an object of class "formula"
Long way around but which is CORRECT is as follows:
#Subset by each Tissue type (just one here for e.g.)
muscle<- subset (DF, Tissue == "muscle")
#Perform Anova
anova(lm(l.dec ~ Ploidy * Time, data = muscle))
However In the main data frame I have many tissue types and want to avoid performing this subset.
I believe the apply formula is close but need help on the final stages.
回答1:
Building on @user20650 and my comments above, I would suggest first using sapply
with lm
to generate your list of models, and then use sapply
again on that list to generate your ANOVA tables. That way the list of models will be available to you so you can get coefficients, fitted values, residuals etc etc.
x <- unique(DF$Tissue)
models <- sapply(x, function(my) {
lm(l.dec ~ Time * Ploidy, data=DF, Tissue==my)
}, simplify=FALSE)
ANOVA.tables <- sapply(models, anova, simplify=FALSE)
来源:https://stackoverflow.com/questions/23961929/correct-use-of-sapply-with-anova-on-multiple-subsets-in-r