Loop linear regression and saving ALL coefficients

Based on the link below, I created a code to run regression on subsets of my data based on a variable.

Loop linear regression and saving coefficients

In this example I created a DUMMY (0 or 1) to create the subsets (in reality I have 3000 subsets)

res <- do.call(rbind, lapply(split(mydata, mydata$DUMMY),function(x){
  fit <- lm(y~x1 + x2, data=x)
  res <- data.frame(DUMMY=unique(x$DUMMY), coeff=coef(fit))
  res
}))

This results in the following dataset

                DUMMY   coeff

0.(Intercept)   0    22.8419956
0.x1            0   -11.5623064
0.x2            0     2.1006948
1.(Intercept)   1     4.2020874
1.x1            1    -0.4924303
1.x2            1     1.0917668

What I would like however is one row per regression, and the variables in the columns. I also need the p values and standard errors included.

DUMMY   interceptx1   coeffx1   p-valuex1   SEx1   coeffx2  p-valuex2   SEx2
0          22.84       -11.56      0.04     0.15    2.10     0.80       0.90
1          4.20        -0.49       0.10     0.60    1.09     0.60       1.20

Any idea how to do this?

While your desired output is (IMHO) not really tidy data, here is an approach using data.table and a custom-built extraction-function. It has an option to return a wide or long form of the results.

The extractor-function takes in a lm-object, and returns estimates, p-values and standard errors for all variables.

extractor <- function(model, return_wide = F){
  #get datatable with coefficient, se and p-value
  model_summary <- as.data.table(summary(model)$coefficients[,-3])
  model_summary[,variable:=names(coef(model))]
  #do some reshaping
  step2 <- melt(model_summary, id.var="variable",variable.name="measure")
  if(!return_wide){
    return(step2)
  }
  step3 <- dcast(step2, 1~variable+measure,value.var="value")
  return(step3)
}

Demonstration:

res_wide <- dat[,extractor(lm(y~x1 + x2), return_wide = T), by = dummy]
> res_wide
# dummy . (Intercept)_Estimate (Intercept)_Std. Error (Intercept)_Pr(>|t|)  x1_Estimate x1_Std. Error x1_Pr(>|t|) x2_Estimate x2_Std. Error x2_Pr(>|t|)
# 1:     0 .           0.04314707             0.04495702            0.3376461 -0.054364406    0.04441204   0.2214895  0.01333804    0.04620999   0.7729757
# 2:     1 .          -0.04137086             0.04471550            0.3553164  0.009864255    0.04533808   0.8278539  0.05272257    0.04507189   0.2426726


res_long <-  dat[,extractor(lm(y~x1 + x2)), by = dummy]
# dummy    variable    measure        value
# 1:     0 (Intercept)   Estimate  0.043147072
# 2:     0          x1   Estimate -0.054364406
# 3:     0          x2   Estimate  0.013338043
# 4:     0 (Intercept) Std. Error  0.044957023
# 5:     0          x1 Std. Error  0.044412037
# 6:     0          x2 Std. Error  0.046209987
# 7:     0 (Intercept)   Pr(>|t|)  0.337646052
# 8:     0          x1   Pr(>|t|)  0.221489530

Data used:

library(data.table)
set.seed(123)
nobs = 1000
dat <- data.table(
  dummy = sample(0:1,nobs,T),
  x1 = rnorm(nobs),
  x2 = rnorm(nobs),
  y = rnorm(nobs))

来源：https://stackoverflow.com/questions/35323389/loop-linear-regression-and-saving-all-coefficients

标签

dataframe

subset

do.call