I have a data frame and I need to run 6 2-variable linear models for each group \'site\'. Then, I need to convert the results to a data frame. The second variable in the linear
I am not sure if this is exactly what you are trying to do, but the data.table plyr
package allows you to run models split by multiple variables. Below is an example, with var1
and var2
simply representing two variables you want each combination of values to be modeled separately.
#load packages
library(data.table)
library(plyr)
#break up by variables, then fit the model to each piece
models <- dlply(data, c("var1","var2"),
function(data)
lm(DV ~
IV1 + IV2
, data = data, weights = weights))
#apply coef to eah model and return a df
models_coef <- ldply(models, coef)
#print summary
l_ply(models_coef, summary, .print = T)
This is how I would do it. Note this is untested as I haven't installed relaimpo
. I'm really just re-packaging your code.
The general method is
1. develop a function that works on one group
2. use split
to divide your data into groups
3. use lapply
to apply the function to each group
4. (if needed) combine the results together
The only changes I made are (a) to pull out a subset of data for one site and name it one_site
. (b) to use one_site
in your modeling code. (c) I prefer pasting a formula together as a string to using substitute
, so I made that change. (d) White space and formatting for readability (mostly using RStudio's "reformat code").
## set up
varlist <- names(d)[4:9]
library(relaimpo)
sumfun <- function(x) {
c(
coef(x),
summary(x)$adj.r.squared,
sqrt(mean(resid(x) ^ 2, na.rm = TRUE)),
calc.relimp(x, type = "betasq")$betasq[1],
calc.relimp(x, type = "betasq")$betasq[2],
calc.relimp(x, type = "pratt")$pratt[1],
calc.relimp(x, type = "pratt")$pratt[2]
)
}
## Testing: this works for one_site
one_site <- subset(d, SiteName == "bp10")
models <- lapply(varlist, function(x) { # apply the modeling function to our list of air variables
form <- as.formula(sprintf("DMWT ~ DMAT + %s", x))
lm(form, data = one_site) # linear model with air variable substituted
})
## desired result
mod.df <- as.data.frame(t(sapply(models, sumfun)))
Once you have code that works for a single site, we turn it into a function. The only inputs seem to be the data for one site and the variables in varlist
. Instead of assigning the result at the bottom, we return
it:
fit_one_site = function(one_site, varlist) {
models <- lapply(varlist, function(x) {
# apply the modeling function to our list of air variables
form = as.formula(sprintf("DMWT ~ DMAT + %s", x))
lm(form, data = one_site) # linear model with air variable substituted
})
return(as.data.frame(t(sapply(models, sumfun))))
}
Now we can use split
to split your data up by SiteName
, and lapply
to apply the fit_one_site
function to each piece.
results = lapply(split(d, d$SiteName), FUN = fit_one_site, varlist = names(d)[4:9])
The results should be list of data frames, one for each site. If you want to combine them into one data frame, see the relevant part of my answer at the list of data frames R-FAQ.