问题
It appears simple, but I don't know how to code it in R. I have a dataframe (df) with ~100 variables, and I would like to do a multiple regression between the response which is my First variable (Y) and the variables 25 to 60 as regressors. The problem is that I don't want to write each variable name like:
lm(Y~var25+var26+.......var60, data=df)
I would like to use something like [, 25:60] to select a complete range. I have tried it but doesn't works:
test <- lm(Y~df[, 25:60], data=df)
summary(test)
some idea?
回答1:
You could subset
the dataset by selecting only those columns, and then do the lm
.
lm(Y~., data=df1[c(1,25:60)])
Suppose, if you need var25
to var60
and if the data is ordered by column names
lm(Y~., data=df1[c(1,26:61)])
Or another option would be to use paste
to create the formula
lm(paste("Y ~", paste(paste0('var', 25:60), collapse="+")), data=df1)
data
set.seed(24)
df1 <- as.data.frame(matrix(sample(1:80, 20*101, replace=TRUE),
ncol=101, dimnames=list(NULL, c('Y', paste0('var', 1:100)))))
来源:https://stackoverflow.com/questions/28523404/r-multiple-linear-regression-with-a-specific-range-of-variables