问题
I would like to know how I can use t.test
or pairwise.t.test
to make multiple comparisons between gene combinations. First, how can I compare all combinations Gene 1 vs. Gene 3, Gene 3 vs Gene 4, etc.? Second, how would I be able to only compare combinations of Gene 1 with the other genes?
Do I need to make a function for this?
Assuming I have the dataset below, when "arguments are not the same length", what can I do?
Thanks.
Gene S1 S2 S3 S4 S5 S6 S7
1 20000 12032 23948 2794 5870 782 699
3 15051 17543 18590 21005 22996 26448
4 35023 43092 41858 39637 40933 38865
回答1:
I think that @akrun has a great answer to help on the programming side of this, but since this question is also related to statistics, it seems important to mention that using multiple t-tests may not be considered a statistically sound method of analysis, depending on the number of comparisons in your full dataset. So please keep that in mind. At the very least, applying a Bonferroni correction, or similar, would be recommended here. So I've added that to @akrun's code.
Prior to running the t-tests, it may also be best to run an ANOVA to see if there are any differences overall. Columbia University has a helpful explanation of this approach on their stats page.
That said, I'll show you how to do both for the sake of answering the programming aspect of the question, but for those looking up the same question, please carefully review your methods before using this answer.
I've displayed the following results without scientific notation for the benefit of those less familiar with it, via options(scipen=999) in R.
Pre-t-test ANOVA:
summary(aov(val ~ as.factor(Gene), data=gather(df, key, val, -Gene)))
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(Gene) 2 2627772989 1313886494 34.49 0.00000245 ***
Residuals 15 571374752 38091650
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
T-test:
library(broom)
library(dplyr)
library(tidyr)
gather(df, key, val, -Gene) %>%
do(data.frame(tidy(pairwise.t.test(.$val, .$Gene, p.adjust="bonferroni"))))
group1 group2 p.value
1 3 1 0.05691493022
2 4 1 0.00000209244
4 4 3 0.00018020669
EDIT:
For these tests, it doesn't particularly matter if the length of the observations are not exactly the same. The code I've outlined above will still run. However, it's generally good practice in R to make blank or null values equal NA. See this SO answer for a way to change values to NA.
If you'd like to limit your t-tests to only a few gene comparisons, for example, gene 1 vs. gene 3 and gene 1 vs. gene 4, but not gene 3 vs gene 4, the simplest way is to still use the code above. Instead of applying p-value correction inside the pairwise.t.test function, however, just apply it afterword on only the p-values you want to assess. Try this:
res <- gather(df, key, val, -Gene) %>%
do(data.frame(tidy(pairwise.t.test(.$val, .$Gene))))
res <- res[res$group1==1 | res$group2 ==1,]
res$p.value <- p.adjust(res$p.value, method = "bonferroni")
print(res)
group1 group2 p.value
1 3 1 0.015989134399
2 4 1 0.000001458475
Note that the above is only applying p-value correction on the tests that we've subset and want to asses, which for this example is any combination that involves gene 1, excluding combinations not involving gene 1.
回答2:
Ok, here another statistical advice. You might want to take a look at Hotelling T-test, as generalization of t-statistics for multivariate distributions.
Packages: ICSNP with tutorial here, or Hotelling
来源:https://stackoverflow.com/questions/46305742/multiple-t-test-comparisons