问题
I am a newbie of R. Due to the need of my project, I need to do Chisq test for hundred thousand entries.
I learned by myself for a few days and write some code for runing chisq.test in loops. codes:
the.data = read.table ("test_chisq_allelefrq.txt", header=T, sep="\t",row.names=1)
p=c()
ID=c()
for (i in 1:nrow(the.data)) {
data.row = the.data [i,]
data.matrix = matrix ( c(data.row$cohort_1_AA, data.row$cohort_1_AB, data.row$cohort_1_BB, data.row$cohort_2_AA, data.row$cohort_2_AB, data.row$cohort_2_BB,data.row$cohort_3_AA,data.row$cohort_3_AB,data.row$cohort_3_BB), byrow=T, nrow=3)
chisq = chisq.test(data.matrix)
pvalue=chisq$p.value
p=c(p, pvalue)
No=row.names(the.data)[i]
ID=c(rsid, SNP )
}
results=data.frame(ID,p)
write.table (results, file = "chisq-test_output.txt", append=F, quote = F, sep = "\t ",eol = "\n", na = "NA", dec = ".", row.names = F, col.names = T)
this code might have several problems. but it works.
However, it runs very slow.
I try to improve it by using "apply"
I plan to use apply twice instead of using "for"
datarow= apply (the.data,1, matrix(the.data, byrow=T, nrow=3))
result=apply(datarow,1,chisq.test)
However, there is error saying matrix is not a function. zsd the chisq.test output is a list, I cannot use write.table to output the data.
the.data is like this.
SN0001 and 9 numbers
cohort_1_AA cohort_1_AB cohort_1_BB cohort_2_AA cohort_2_AB cohort_2_BB cohort_3_AA cohort_3_AB cohort_3_BB
SN0001 197 964 1088 877 858 168 351 435 20
....
....
I have been trying for days and nights. Hope someone can help me. Thank you very much.
回答1:
To use apply group of functions it is easy first to define our own function and then apply it. Lets do that.
##first define the function to apply
Chsq <- function(x){
## input is a row of your data
## creating a table from each row
x <- matrix(x,byrow =TRUE,nrow=3)
### this will return the p value
return(chisq.test(x)$p.value)
}
## Now apply this function
data = read.table ("test_chisq_allelefrq.txt", header=T, sep="\t",row.names=1)
## by using as.vector convert the output into a vector
P_Values <- as.vector(apply(data,1,Chsq))
result <- cbind(rownames(data),P_Values)
write.table (results, file = "chisq-test_output.txt", append=F, quote = F, sep = "\t ",eol = "\n", na = "NA", dec = ".", row.names = F, col.names = T)
Try this code hopefully it works !! :) Accept the answer as correct if it works for you. thanks
回答2:
One for
loop implies one apply
, not two.
Something like this:
result=apply(the.data, 1, function(data.row) {
## Your code using data.row
})
If the result is more readable than the for
loop, go with it. Otherwise stick with what you have. apply
won't be noticeably different in speed (faster or slower).
来源:https://stackoverflow.com/questions/24359060/how-to-run-chisq-test-in-loops-using-apply