问题
I wonder if I can use such as for loop or apply function to do the linear regression in R. I have a data frame containing variables such as crim, rm, ad, wd. I want to do simple linear regression of crim on each of other variable.
Thank you!
回答1:
If you really want to do this, it's pretty trivial with lapply()
, where we use it to "loop" over the other columns of df
. A custom function takes each variable in turn as x
and fits a model for that covariate.
df <- data.frame(crim = rnorm(20), rm = rnorm(20), ad = rnorm(20), wd = rnorm(20))
mods <- lapply(df[, -1], function(x, dat) lm(crim ~ x, data = dat))
mods
is now a list of lm
objects. The names
of mods
contains the names of the covariate used to fit the model. The main negative of this is that all the models are fitted using a variable x
. More effort could probably solve this, but I doubt that effort is worth the time.
If you are just selecting models, which may be dubious, there are other ways to achieve this. For example via the leaps package and its regsubsets
function:
library("leapls")
a <- regsubsets(crim ~ ., data = df, nvmax = 1, nbest = ncol(df) - 1)
summa <- summary(a)
Then plot(a)
will show which of the models is "best", for example.
Original
If I understand what you want (crim
is a covariate and the other variables are the responses you want to predict/model using crim
), then you don't need a loop. You can do this using a matrix response in a standard lm()
.
Using some dummy data:
df <- data.frame(crim = rnorm(20), rm = rnorm(20), ad = rnorm(20), wd = rnorm(20))
we create a matrix or multivariate response via cbind()
, passing it the three response variables we're interested in. The remaining parts of the call to lm
are entirely the same as for a univariate response:
mods <- lm(cbind(rm, ad, wd) ~ crim, data = df)
mods
> mods
Call:
lm(formula = cbind(rm, ad, wd) ~ crim, data = df)
Coefficients:
rm ad wd
(Intercept) -0.12026 -0.47653 -0.26419
crim -0.26548 0.07145 0.68426
The summary()
method produces a standard summary.lm
output for each of the responses.
回答2:
Suppose you want to have response variable fix as first column of your data frame and you want to run simple linear regression multiple times individually with other variable keeping first variable fix as response variable.
h=iris[,-5]
for (j in 2:ncol(h)){
assign(paste("a", j, sep = ""),lm(h[,1]~h[,j]))
}
Above is the code which will create multiple list of regression output and store it in a2,a3,....
来源:https://stackoverflow.com/questions/37314006/how-to-use-loop-to-do-linear-regression-in-r