Regression by subset in R [duplicate]

前端未结

关注

 3  1023

春和景丽

相关标签:

3条回答

你的背包

2021-01-07 03:25

You can also do it using only base functions:

# load data
data(warpbreaks)

# fit linear model to each subset
fits <- by(warpbreaks, warpbreaks[,"tension"],
           function(x) lm(breaks ~ wool, data = x))

# Combine coefficients from each model
do.call("rbind", lapply(fits, coef))

0 讨论(0)

佛祖请我去吃肉

2021-01-07 03:28
ddply passes data.frames (from splitting the input data.frame) to the function. You probably want this:
```
ddply(MyData, "Case", function(df) coefficients(lm(Y~X1+X2+X3, data=df)))
```
(Not tested since you don't provide a reproducible example.)

You passed the whole input data.frame to lm for each group..
0 讨论(0)
发布评论:

提交评论
- 加载中...

灰色年华

2021-01-07 03:35

Though you can get pretty far with (d)plyr, I get the most flexibility with a simple for loop (as I’m not too confident with the do() when it comes to wanting more than just the output of coefficients(), for example)

Let’s start by loading the data and loading some packages:

library(dplyr)
library(broom) # get lm output (coefficients) as a dataframe
library(reshape2) # Melt / DCAST for swapping between wide / long format
data(warpbreaks)

Next, create a list of groups we’re going to be iterating over.

GROUPS <- unique(warpbreaks$tension) # Specify groups, one unique model per group.

The code to answer your question, in this case, would be something of the sorts:

for (i in 1:length(GROUPS)){
  CURRENT_GROUP <- GROUPS[i]
  df <- filter(warpbreaks, tension==CURRENT_GROUP) # subset the dataframe
  fit <- lm(breaks ~ wool, data = df) # Build a model
  coeff <- tidy(fit) # Get a pretty data frame of the coefficients & p values
  coeff <- coeff[,c(1,2,5)] # Extract P.Value & Estimate

  # Rename (intercept) to INT for pretty column names
  coeff[coeff$term=="(Intercept)", ]$term <- "INT"

  # Make it into wide format with reshape2 package.
  coeff <- coeff %>% melt(id.vars=c("term"))

  # Defactor the resulting data.frame
  coeff <- mutate(coeff,
                  variable=as.character(variable),
                  term=as.character(term))

  # Rename for prettier column names later
  coeff[coeff$variable=="estimate", ]$variable <- "Beta"
  coeff[coeff$variable=="p.value", ]$variable <- "P"

  coeff <- dcast(coeff, .~term+variable)[,-1]

  rsquared <- summary(fit)$r.squared

  # Create a df out of what we just did.
  row <- cbind(
    data.frame(
      group=CURRENT_GROUP,
      rsquared=rsquared),
    coeff
  )

  # If first iteration, create data.frame -- otherwise: rowbind
  if (i==1){
    RESULT_ROW = row
  }
  else{
    RESULT_ROW = rbind(RESULT_ROW, row)
  } # End if.
} # End for loop

When writing the inner for loop code, just test something out by simulating the loop (first run i <- 1, run the inner loop code step by step, then run i <- 2 etc.). The resulting dataframe:

RESULT_ROW  
##   group   rsquared INT_Beta        INT_P woolB_Beta    woolB_P
## 1     L 0.26107601 44.55556 9.010046e-08 -16.333333 0.03023435
## 2     M 0.07263228 24.00000 5.991177e-07   4.777778 0.27947884
## 3     H 0.12666292 24.55556 9.234773e-08  -5.777778 0.14719571

I’m not saying this is better than the answers described above, I just like the fact that this gives me the flexibility to extract anything from any model (not just lm models that have a pretty coefficients() function).

0 讨论(0)

热议问题