I\'m a PhD student of genetics and I am trying do association analysis of some genetic data using linear regression. In the table below I\'m regressing each \'trait\' agains
Personally I don't find the problem so easy. Specially for an R novice.
Here a solution based on creating dynamically the regression formula.
The idea is to use paste
function to create different formula terms, y~ x + var + x * var
then coercing the result string tp a formula using as.formula
. Here y
and x
are the formula dynamic terms: y in c(trait1,trai2,..) and x in c(SNP1,SNP2,...). Of course here I use lapply
to loop.
lapply(1:3,function(i){
y <- paste0('trait',i)
x <- paste0('SNP',i)
factor1 <- x
factor2 <- 'var'
factor3 <- paste(x,'var',sep='*')
listfactor <- c(factor1,factor2,factor3)
form <- as.formula(paste(y, "~",paste(listfactor,collapse="+")))
lm(formula = form, data = dat)
})
I hope someone come with easier solution, ore more R-ish one:)
EDIT
Thanks to @DWin comment , we can simplify the formula to just y~x*var
since it means y
is modeled by x
,var
and x*var
So the code above will be simplified to :
lapply(1:3,function(i){
y <- paste0('trait',i)
x <- paste0('SNP',i)
LHS <- paste(x,'var',sep='*')
form <- as.formula(paste(y, "~",LHS)
lm(formula = form, data = dat)
})