问题
I am using a Montecarlo simulation for predicting mpg in the mtcars data. I want to extract the coefficients of all the variables in the dataframe to compute how many times each car has lower mpg than the other car. For example how many times Toyota Corona has less predicted mpg than Datsun 710. This is my initial code using only two independent variables. I want to expand this selection to use all the variables in the data frame without manually have to include all the variables in the data frame. Is there any way I can do this?
library(pacman)
pacman::p_load(data.table, fixest, stargazer, dplyr, magrittr)
df <- mtcars
fit <- lm(mpg~cyl + hp, data = df)
fit$coefficients[1]
beta_0 = fit$coefficients[1] # Intercept
beta_1 = fit$coefficients[2] # Slope
beta_2 = fit$coefficients[3]
set.seed(1) # Seed
n = 1000 # Sample size
M = 500 # Number of experiments/iterations
estimates_DT <- do.call("rbind",lapply(1:M, function(i) {
# Generate data
U_i = rnorm(n, mean = 0, sd = 2) # Error
X_i_1 = rnorm(n, mean = 5, sd = 5) # First independent variable
X_i_2 = rnorm(n, mean = 5, sd = 5) #Second ndependent variable
Y_i = beta_0 + beta_1*X_i_1 + beta_2*X_i_2 + U_i # Dependent variable
# Formulate data.table
data_i = data.table(Y = Y_i, X1 = X_i_1, X2 = X_i_2)
# Run regressions
ols_i <- fixest::feols(data = data_i, Y ~ X1 + X2)
ols_i$coefficients
}))
estimates_DT <- setNames(data.table(estimates_DT),c("beta_0","beta_1","beta_2"))
compareCarEstimations <- function(carname1="Mazda RX4",carname2="Datsun 710") {
car1data <- mtcars[rownames(mtcars) == carname1,c("cyl","hp")]
car2data <- mtcars[rownames(mtcars) == carname2,c("cyl","hp")]
predsCar1 <- estimates_DT[["beta_0"]] + car1data$cyl*estimates_DT[["beta_1"]]+car1data$hp*estimates_DT[["beta_2"]]
predsCar2 <- estimates_DT[["beta_0"]] + car2data$cyl*estimates_DT[["beta_1"]]+car2data$hp*estimates_DT[["beta_2"]]
list(
car1LowerCar2 = sum(predsCar1 < predsCar2),
car2LowerCar1 = sum(predsCar1 >= predsCar2)
)
}
compareCarEstimations("Toyota Corona", "Datsun 710")
回答1:
I haven't gone all the way through your example, but here is the nugget of how to construct a set of randomized predictor variables and matrix-multiply them by the coefficient vector to get predicted values:
Setup:
df <- mtcars
fit <- lm(mpg~cyl + hp, data = df)
n <- 1000
beta <- coef(fit) ## parameter vector (includes intercept)
npar <- length(beta)
X <- matrix(rnorm(n*npar),ncol=npar) ## includes intercept
## scale columns by the corresponding sd
## (all identical in this case)
X <- sweep(X, MARGIN=2, FUN="*", STATS=rep(5,npar))
## shift columns by the corresponding mean
## (all identical in this case)
X <- sweep(X, MARGIN=2, FUN="+", STATS=rep(5,npar))
Y0 <- X %*% beta
Y <- rnorm(n, mean=Y0, sd=2)
来源:https://stackoverflow.com/questions/66053962/how-to-extract-the-coefficients-from-a-linear-model-without-repeating-my-code-in