问题
I have a dataframe which looks like this
> head(data)
LH3003 LH3004 LH3005 LH3006 LH3007 LH3008 LH3009 LH3010 LH3011
cg18478105 0.02329879 0.08103364 0.01611778 0.01691191 0.01886975 0.01885553 0.01647439 0.02120779 0.01168622
cg14361672 0.09479536 0.07821380 0.02522833 0.06467310 0.05387729 0.05866673 0.08121820 0.10920162 0.04413263
cg01763666 0.03625680 0.04633759 0.04401555 0.08371531 0.09866403 0.17611284 0.07306743 0.12422579 0.11125146
cg02115394 0.10014794 0.09274320 0.08743445 0.08906313 0.09934032 0.18164115 0.06526380 0.08158144 0.08862067
cg13417420 0.01811630 0.02221060 0.01314041 0.01964530 0.02367295 0.01209913 0.01612864 0.01306061 0.04421938
cg26724186 0.32776266 0.31386294 0.24167480 0.29036142 0.24751268 0.26894756 0.20927278 0.28070790 0.33188921
LH3012 LH3013 LH3014
cg18478105 0.02466508 0.01909706 0.02054417
cg14361672 0.09172160 0.06170230 0.07752691
cg01763666 0.04328518 0.13693868 0.04288165
cg02115394 0.08682942 0.08601880 0.12413149
cg13417420 0.01980470 0.02241745 0.02038114
cg26724186 0.30832389 0.27644816 0.37630038
with almost 850000 rows, and a different dataframe which contains the information behind the sample names:
> variables
Sample_ID Name Group01
3 LH3003 pair1 0
4 LH3004 pair1 1
5 LH3005 pair2 0
6 LH3006 pair2 1
7 LH3007 pair3 0
8 LH3008 pair3 1
9 LH3009 pair4 0
10 LH3010 pair4 1
11 LH3011 pair5 0
12 LH3012 pair5 1
13 LH3013 pair6 0
14 LH3014 pair6 1
Is it possible to do a paired t-test by defining the pairs and the group annotation of the samples based on another dataframe?
Thank you for your help!
回答1:
Here is an lapply
method that will store the results of each test in a list. This assumes that each pair is adjacent in the second data.frame,df2 and the first data.frame is named df1.
myTestList <- lapply(seq(1, nrow(df2), 2), function(i)
t.test(df1[[df2$Sample_ID[i]]], df1[[df2$Sample_ID[i+1]]], paired=TRUE))
which returns
myTestList
[[1]]
Paired t-test
data: df1[[df2$Sample_ID[i]]] and df1[[df2$Sample_ID[i + 1]]]
t = -0.50507, df = 5, p-value = 0.635
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.03453201 0.02319070
sample estimates:
mean of the differences
-0.005670653
[[2]]
Paired t-test
data: df1[[df2$Sample_ID[i]]] and df1[[df2$Sample_ID[i + 1]]]
t = -2.5322, df = 5, p-value = 0.05239
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.0459320947 0.0003458114
sample estimates:
mean of the differences
-0.02279314
data
df1 <- read.table(header=TRUE, text="LH3003 LH3004 LH3005 LH3006 LH3007 LH3008 LH3009 LH3010 LH3011
cg18478105 0.02329879 0.08103364 0.01611778 0.01691191 0.01886975 0.01885553 0.01647439 0.02120779 0.01168622
cg14361672 0.09479536 0.07821380 0.02522833 0.06467310 0.05387729 0.05866673 0.08121820 0.10920162 0.04413263
cg01763666 0.03625680 0.04633759 0.04401555 0.08371531 0.09866403 0.17611284 0.07306743 0.12422579 0.11125146
cg02115394 0.10014794 0.09274320 0.08743445 0.08906313 0.09934032 0.18164115 0.06526380 0.08158144 0.08862067
cg13417420 0.01811630 0.02221060 0.01314041 0.01964530 0.02367295 0.01209913 0.01612864 0.01306061 0.04421938
cg26724186 0.32776266 0.31386294 0.24167480 0.29036142 0.24751268 0.26894756 0.20927278 0.28070790 0.33188921")[1:4]
df2 <- read.table(header=TRUE, text=" Sample_ID Name Group01
3 LH3003 pair1 0
4 LH3004 pair1 1
5 LH3005 pair2 0
6 LH3006 pair2 1")
回答2:
You need to stack your data and define a pair column and then run the t.test, this is for 1 of the 6 tests:
data2 <- data.frame(x = c(data$LH3003, data$LH3004), pair = c(rep(0, nrow(data)), rep(1, nrow(data))))
t.test(x ~ pair, data2)
回答3:
Here's a variation on @Imo's:
lapply(unique(df2$Name), function(x){
samples <- df2[df2$Name==x,1]
t.test(df1[,samples[1]], df1[,samples[2]], paired=T)
})
来源:https://stackoverflow.com/questions/40611064/paired-t-test-with-pairs-and-groups-defined-in-another-dataframe