问题
What is the most efficient way to run regression models for a list of 20 independent variables (e.g. genetic variants, each of these genetic variants will be tested alone) and 40 dependent variables? I am a beginner to R! I found a solution but it would work only if I had 1 independent variable. Not sure how I would go about if I had many (http://techxhum.dk/loop-multiple-variables/)
Thanks for your time.
回答1:
Here's a somewhat dense solution that uses the mfastLmCpp()
function from the MESS
package. It runs simple linear regression for multiple instruments and we just wrap it in an apply()
call to get it to work with multiple dependent variables.
N <- 1000 # Number of observations
Nx <- 20 # Number of independent variables
Ny <- 80 # Number of dependent variables
# Simulate outcomes that are all standard Gaussians
Y <- matrix(rnorm(N*Ny), ncol=Ny)
X <- matrix(rnorm(N*Nx), ncol=Nx)
# Now loop over each dependent variable and get a list of t test statistics
# for each independent variabel
apply(Y, 2, FUN=function(y) { MESS::mfastLmCpp(y=y, x=X) })
With the above setup it takes less than a second on my laptop.
Update: Added the functionality to the plr
function in the MESS
package.
devtools::install_github('ekstroem/MESS')
plr(Y, X)
et voila!
来源:https://stackoverflow.com/questions/59337879/most-efficient-way-to-run-regression-models-for-multiple-independent-variables-o