问题
MVE: Let this be the data set:
data <- data.frame(year = rep(seq(1966,2015,1), 8),
county = c(rep('prva', 50), rep('druga', 50), rep('treća', 50), rep('četvrta', 50),
rep('peta', 50), rep('šesta', 50), rep('sedma', 50), rep('osma', 50)),
crime1 = runif(400), crime2 = runif(400), crime3 = runif(400),
uvar1 = runif(400), uvar2 = runif(400), uvar3 = runif(400),
var1 = runif(400), var2 = runif(400), var3 = runif(400), var4 = runif(400), var5 = runif(400))
Let's say crime1,2 and 3 are specific dependent variables. uvar1,2 and 3 are specific independent variables. var1,2 etc. are other covariates. What I'm trying to do is automate the regressions.
Namely, I want to get the result of this code:
plm(log(crime1) = log(univar1) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data)
plm(log(crime2) = log(univar2) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data)
etc.; but without writing 20 lines of code for each estimated model.
By looking at similar questions, this is as far as I'd come:
crime <- c('crime1', 'crime2', 'crime3')
plm.results <- lapply(data[, crime], function(y) plm(y ~ var1 + var2 + var3 + var4,
model = 'within', effect ='twoways', data = data))
Which certainly helps for my dependent variables, but I cannot figure how to include specific independent variables in each of these estimations. To clarify once more, I want univar1 to be in the first regression, but not in the rest of them etc.
回答1:
formula
function is helpful when creating multiple sets of models. You could incorporate variations
using combination of paste0
and formula
with lapply
to traverse the indices 1 to 3.
#remember to set.seed when sampling from distributions
set.seed(123)
#a helper function to create "log(var)" from "var"
fn_appendLog = function(x) {
paste0("log(",x,")")
}
modelList = lapply(1:3,function(x) {
indepVars2 = Reduce(function(x,y) paste(x,y,sep="+"),lapply(colnames(regDF)[grepl("^v",colnames(regDF))],fn_appendLog))
#> indepVars2
#[1] "log(var1)+log(var2)+log(var3)+log(var4)+log(var5)"
indepVars1 = fn_appendLog(paste0("uvar",x))
depVar = fn_appendLog(paste0("crime",x))
formulaVar = formula(paste0(depVar, " ~ ",indepVars1,"+", indepVars2))
#> formulaVar
#log(crime1) ~ log(uvar1) + log(var1) + log(var2) + log(var3) + log(var4) + log(var5)
modelObj = plm(formulaVar, model = 'within', effect = 'twoways', data = regDF)
})
Summary:
summary(modelList[[1]])
#> summary(modelList[[1]])
#Twoways effects Within Model
#
#Call:
#plm(formula = formulaVar, data = regDF, effect = "twoways", model = "within")
#
#Balanced Panel: n=50, T=8, N=400
#
#Residuals :
# Min. 1st Qu. Median 3rd Qu. Max.
# -5.730 -0.396 0.116 0.599 1.520
#
#Coefficients :
# Estimate Std. Error t-value Pr(>|t|)
#log(uvar1) 0.0393871 0.0490891 0.8024 0.4229
#log(var1) -0.0369356 0.0541029 -0.6827 0.4953
#log(var2) -0.0455269 0.0543664 -0.8374 0.4030
#log(var3) 0.0150516 0.0520347 0.2893 0.7726
#log(var4) -0.0034534 0.0441506 -0.0782 0.9377
#log(var5) -0.0109038 0.0527446 -0.2067 0.8363
#
#Total Sum of Squares: 302.23
#Residual Sum of Squares: 300.6
#R-Squared: 0.0053896
#Adj. R-Squared: 0.0045407
#F-statistic: 0.304357 on 6 and 337 DF, p-value: 0.93448
Explanation:
The independent variables are of two type, first uvar1
and others var1...varN
.
1) colnames(regDF)[grepl("^v",colnames(regDF))]
this will give us a list of all variables
in regDF which match pattern of beginning with letter 'v' with caret symbol signifying start of
the string and $
as end of the string, output at this stage is c("var1","var2"...,"var5")
2) We need log variants of this variable vector hence we pass them through lapply
to the function
fn_appendLog
, which results in the list output of list("log(var1)","log(var2)",...,"log(var5)")
3) Next, we need these variables transformed as log(var1)+log(var2)...+log(var5)
4) To do so, we use function Reduce
with the function paste(x,y,sep="+")
, this takes
each element of the above list with adjacent element and joins together with the seperator as "+"
step1 = (log(var1)+log(var2))
step2 = (log(var1)+log(var2)) + log(var3)
step3 = (log(var1)+log(var2)+log(var3))+ log(var4) and so on
5) The function Reduce
applies the function to the list and aggregates the output into a single vector
resulting the final output of log(var1)+log(var2)+log(var3)+log(var4)+log(var5)
This might seem intimidating at first but as you use them often and explore examples they
will part of you repertoire in no time.The best way to learn about a function say lapply
is to read the documentation of ?lapply
end to end and execute
listed examples, tinker with parameters and gain familiarity. Hope this sheds some light
on your query.
来源:https://stackoverflow.com/questions/43209809/automate-regression-with-specific-dependent-and-independent-variables