问题
I would like to run a KW-test over certain numerical variables from a data frame, using one grouping variable. I'd prefer to do this in a loop, instead of typing out all the tests, as they are many variables (more than in the example below).
Simulated data:
library(dplyr)
set.seed(123)
Data <- tbl_df(
data.frame(
muttype = as.factor(rep(c("missense", "frameshift", "nonsense"), each = 80)),
ados.tsc = runif(240, 0, 10),
ados.sa = runif(240, 0, 10),
ados.rrb = runif(240, 0, 10))
) %>%
group_by(muttype)
ados.sim <- as.data.frame(Data)
The following code works just fine outside of the loop.
kruskal.test(formula (paste((colnames(ados.sim)[2]), "~ muttype")), data =
ados.sim)
But it doesn't inside the loop:
for(i in names(ados.sim[,2:4])){
ados.mtp <- kruskal.test(formula (paste((colnames(ados.sim)[i]), "~ muttype")),
data = ados.sim)
}
I get the error:
Error in terms.formula(formula, data = data) : invalid term in model formula
Anybody who knows how to solve this? Much appreciated!!
回答1:
Try:
results <- list()
for(i in names(ados.sim[,2:4])){
results[[i]] <- kruskal.test(formula(paste(i, "~ muttype")), data = ados.sim)
}
This also saves your results in a list and avoids overwriting your results as ados.mtp
in every iteration, which I think is not what you intended to do.
Note the following:
for(i in names(ados.sim[,2:4])){
print(i)
}
[1] "ados.tsc"
[1] "ados.sa"
[1] "ados.rrb"
That is, i
already gives you the name of the column. The problem in your code was that you tried to use it like an integer for subsetting, which turned the outcome into NA
.
for(i in names(ados.sim[,2:4])){
print(paste((colnames(ados.sim)[i]), "~ muttype"))
}
[1] "NA ~ muttype"
[1] "NA ~ muttype"
[1] "NA ~ muttype"
And just for reference, all of this could also be done in the following two ways that I often prefer since it makes subsequent analysis slightly easier:
First, store all test objects in a dataframe:
library(tidyr)
df <- ados.sim %>% gather(key, value, -muttype) %>%
group_by(key) %>%
do(test = kruskal.test(x= .$value, g = .$muttype))
You can then subset the dataframe to get the test outcomes:
df[df$key == "ados.rrb",]$test
[[1]]
Kruskal-Wallis rank sum test
data: .$value and .$muttype
Kruskal-Wallis chi-squared = 2.2205, df = 2, p-value = 0.3295
Alternatively, get all results directly in a dataframe, without storing the test objects:
library(broom)
df2 <- ados.sim %>% gather(key, value, -muttype) %>%
group_by(key) %>%
do(tidy(kruskal.test(x= .$value, g = .$muttype)))
df2
# A tibble: 3 x 5
# Groups: key [3]
key statistic p.value parameter method
<chr> <dbl> <dbl> <int> <fctr>
1 ados.rrb 2.2205031 0.3294761 2 Kruskal-Wallis rank sum test
2 ados.sa 0.1319554 0.9361517 2 Kruskal-Wallis rank sum test
3 ados.tsc 0.3618102 0.8345146 2 Kruskal-Wallis rank sum test
来源:https://stackoverflow.com/questions/50194609/r-kruskal-wallis-test-in-loop-over-specified-columns-in-data-frame