I would like to create a function which can run a regression model (e.g. using lm) over different variables in a given dataset. In this function, I would specify as arguments the dataset I'm using, the dependent variable y and the independent variable x. I want this to be a function and not a loop as I would like to call the code in various places of my script. My naive function would look something like this:
lmfun <- function(data, y, x) {
lm(y ~ x, data = data)
}
This function obviously does not work because the lm function does not recognize y and x as variables of the dataset.
I have done some research and stumbled upon the following helpful vignette: programming with dplyr. The vignette gives the following solution to a similar problem as the one I am facing:
df <- tibble(
g1 = c(1, 1, 2, 2, 2),
g2 = c(1, 2, 1, 2, 1),
a = sample(5),
b = sample(5)
)
my_sum <- function(df, group_var) {
group_var <- enquo(group_var)
df %>%
group_by(!! group_var) %>%
summarise(a = mean(a))
}
I am aware that lm is not a function that is part of the dplyr package but would like to come up with a solution similar as this. I've tried the following:
lmfun <- function(data, y, x) {
y <- enquo(y)
x <- enquo(x)
lm(!! y ~ !! x, data = data)
}
lmfun(mtcars, mpg, disp)
Running this code gives the following error message:
Error in is_quosure(e2) : argument "e2" is missing, with no default
Anyone has an idea on how to amend the code to make this work?
Thanks,
Joost.
You can fix this problem by using the quo_name
's and formula
:
lmfun <- function(data, y, x) {
y <- enquo(y)
x <- enquo(x)
model_formula <- formula(paste0(quo_name(y), "~", quo_name(x)))
lm(model_formula, data = data)
}
lmfun(mtcars, mpg, disp)
# Call:
# lm(formula = model_formula, data = data)
#
# Coefficients:
# (Intercept) disp
# 29.59985 -0.04122
Another solution:
lmf2 <- function(data,y,x){
fml <- substitute(y~x, list(y=substitute(y), x=substitute(x)))
lm(eval(fml), data)
}
lmf2(mtcars, mpg, disp)
# Call:
# lm(formula = eval(fml), data = data)
#
# Coefficients:
# (Intercept) disp
# 29.59985 -0.04122
Or, equivalently:
lmf3 <- function(data,y,x){
lm(eval(call("~", substitute(y), substitute(x))), data)
}
If the arguments are unquoted, then convert to symbol (sym
) after changing the quosure to string (quo_name
) and evaluate the expression in lm
(similar to the OP's syntax of lm
)
library(rlang)
lmfun <- function(data, y, x) {
y <- sym(quo_name(enquo(y)))
x <- sym(quo_name(enquo(x)))
expr1 <- expr(!! y ~ !! x)
model <- lm(expr1, data = data)
model$call$formula <- expr1 # change the call formula
model
}
lmfun(mtcars, mpg, disp)
#Call:
#lm(formula = mpg ~ disp, data = data)
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122
An option if we are passing strings would be convert to symbols with ensym
and then quote
it in lm
lmfun <- function(data, y, x) {
y <- ensym(y)
x <- ensym(x)
expr1 <- expr(!! y ~ !! x)
model <- lm(expr1, data = data)
model$call$formula <- expr1 # change the call formula
model
}
lmfun(mtcars, 'mpg', 'disp')
#Call:
#lm(formula = mpg ~ disp, data = data)
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122
NOTE: Both the options are from tidyverse
Here is another option: EDIT: Here is a refactored answer
lmfun<-function(data,yname,xname){
formula1<-as.formula(paste(yname,"~",xname))
lm.fit<-do.call("lm",list(data=quote(data),formula1))
lm.fit
}
lmfun(mtcars,"mpg","disp")
And the Original Answer:
lmfun<-function(data,y,x){
formula1<-as.formula(y~x)
lm.fit<-do.call("lm",list(data=quote(data),formula1))
lm.fit
}
lmfun(mtcars,mtcars$mpg,mtcars$disp)
Yields:
Call:
lm(formula = y ~ x, data = data)
Coefficients:
(Intercept) x
29.59985 -0.04122
来源:https://stackoverflow.com/questions/54060985/function-which-runs-lm-over-different-variables