Though you can get pretty far with (d)plyr, I get the most flexibility with a simple for loop (as I’m not too confident with the do() when it comes to wanting more than just the output of coefficients(), for example)
Let’s start by loading the data and loading some packages:
library(dplyr)
library(broom) # get lm output (coefficients) as a dataframe
library(reshape2) # Melt / DCAST for swapping between wide / long format
data(warpbreaks)
Next, create a list of groups we’re going to be iterating over.
GROUPS <- unique(warpbreaks$tension) # Specify groups, one unique model per group.
The code to answer your question, in this case, would be something of the sorts:
for (i in 1:length(GROUPS)){
CURRENT_GROUP <- GROUPS[i]
df <- filter(warpbreaks, tension==CURRENT_GROUP) # subset the dataframe
fit <- lm(breaks ~ wool, data = df) # Build a model
coeff <- tidy(fit) # Get a pretty data frame of the coefficients & p values
coeff <- coeff[,c(1,2,5)] # Extract P.Value & Estimate
# Rename (intercept) to INT for pretty column names
coeff[coeff$term=="(Intercept)", ]$term <- "INT"
# Make it into wide format with reshape2 package.
coeff <- coeff %>% melt(id.vars=c("term"))
# Defactor the resulting data.frame
coeff <- mutate(coeff,
variable=as.character(variable),
term=as.character(term))
# Rename for prettier column names later
coeff[coeff$variable=="estimate", ]$variable <- "Beta"
coeff[coeff$variable=="p.value", ]$variable <- "P"
coeff <- dcast(coeff, .~term+variable)[,-1]
rsquared <- summary(fit)$r.squared
# Create a df out of what we just did.
row <- cbind(
data.frame(
group=CURRENT_GROUP,
rsquared=rsquared),
coeff
)
# If first iteration, create data.frame -- otherwise: rowbind
if (i==1){
RESULT_ROW = row
}
else{
RESULT_ROW = rbind(RESULT_ROW, row)
} # End if.
} # End for loop
When writing the inner for loop code, just test something out by simulating the loop (first run i <- 1, run the inner loop code step by step, then run i <- 2 etc.).
The resulting dataframe:
RESULT_ROW
## group rsquared INT_Beta INT_P woolB_Beta woolB_P
## 1 L 0.26107601 44.55556 9.010046e-08 -16.333333 0.03023435
## 2 M 0.07263228 24.00000 5.991177e-07 4.777778 0.27947884
## 3 H 0.12666292 24.55556 9.234773e-08 -5.777778 0.14719571
I’m not saying this is better than the answers described above, I just like the fact that this gives me the flexibility to extract anything from any model (not just lm models that have a pretty coefficients() function).