I tried to do t-test to all columns (two at a time) of my data frame, and extract only the p-value. Here is what I have come up with:
for (i in c(5:525) ) {
Assuming your data frame looks something like this:
df = data.frame(a=runif(100),
b=runif(100),
c=runif(100),
d=runif(100),
e=runif(100),
f=runif(100))
the the following
tests = lapply(seq(1,length(df),by=2),function(x){t.test(df[,x],df[,x+1])})
will give you tests for each set of columns. Note that this will only give you a t.test for a & b, c & d, and e & f. if you wanted a & b, b & c, c & d, d & e, and e & f, then you would have to do:
tests = lapply(seq(1,(length(df)-1)),function(x){t.test(df[,x],df[,x+1])})
finally if let's say you only want the P values from your tests then you can do this:
pvals = sapply(tests, function(x){x$p.value})
If you are not sure how to work with an object, try typing summary(tests), and str(tests[[1]]) - in this case test is a list of htest objects, and you want to know the structure of the htest object, not necessarily the list.
Hope this helped!
I would recommend to convert your data frame to long format and use pairwise.t.test
with appropriate p.adjust
:
> library(reshape2)
>
> df <- data.frame(a=runif(100),
+ b=runif(100),
+ c=runif(100)+0.5,
+ d=runif(100)+0.5,
+ e=runif(100)+1,
+ f=runif(100)+1)
>
> d <- melt(df)
Using as id variables
>
> pairwise.t.test(d$value, d$variable, p.adjust = "none")
Pairwise comparisons using t tests with pooled SD
data: d$value and d$variable
a b c d e
b 0.86 - - - -
c <2e-16 <2e-16 - - -
d <2e-16 <2e-16 0.73 - -
e <2e-16 <2e-16 <2e-16 <2e-16 -
f <2e-16 <2e-16 <2e-16 <2e-16 0.63
P value adjustment method: none
> pairwise.t.test(d$value, d$variable, p.adjust = "bon")
Pairwise comparisons using t tests with pooled SD
data: d$value and d$variable
a b c d e
b 1 - - - -
c <2e-16 <2e-16 - - -
d <2e-16 <2e-16 1 - -
e <2e-16 <2e-16 <2e-16 <2e-16 -
f <2e-16 <2e-16 <2e-16 <2e-16 1
P value adjustment method: bonferroni
Try this one
X <- rnorm(n=50, mean = 10, sd = 5)
Y <- rnorm(n=50, mean = 15, sd = 6)
Z <- rnorm(n=50, mean = 20, sd = 5)
Data <- data.frame(X, Y, Z)
library(plyr)
combos <- combn(ncol(Data),2)
adply(combos, 2, function(x) {
test <- t.test(Data[, x[1]], Data[, x[2]])
out <- data.frame("var1" = colnames(Data)[x[1]]
, "var2" = colnames(Data[x[2]])
, "t.value" = sprintf("%.3f", test$statistic)
, "df"= test$parameter
, "p.value" = sprintf("%.3f", test$p.value)
)
return(out)
})
X1 var1 var2 t.value df p.value
1 1 X Y -5.598 92.74744 0.000
2 2 X Z -9.361 90.12561 0.000
3 3 Y Z -3.601 97.62511 0.000
Here is another solution, with outer
.
outer(
1:ncol(Data), 1:ncol(Data),
Vectorize(
function (i,j) t.test(Data[,i], Data[,j])$p.value
)
)
I run this:
tres<-apply(x,1,t.test)
pval<-vapply(tres, "[[", 0, i = "p.value")
It took me a while to divine the "vapply" trick to pull the pvals out of the t.test result object list. (I edited this from 'sapply' because of Henrik's comment below)
If it's a paired t-test, you can just subtract and test for means=0, which gives exactly the same result (that's all a paired t.test is):
tres<-apply(y-x,1,t.test)
pval<-vapply(tres, "[[", 0, i = "p.value")
Again this is a per-row t-test over all columns.