问题
Based on the link below, I created a code to run regression on subsets of my data based on a variable.
Loop linear regression and saving coefficients
In this example I created a DUMMY (0 or 1) to create the subsets (in reality I have 3000 subsets)
res <- do.call(rbind, lapply(split(mydata, mydata$DUMMY),function(x){
fit <- lm(y~x1 + x2, data=x)
res <- data.frame(DUMMY=unique(x$DUMMY), coeff=coef(fit))
res
}))
This results in the following dataset
DUMMY coeff
0.(Intercept) 0 22.8419956
0.x1 0 -11.5623064
0.x2 0 2.1006948
1.(Intercept) 1 4.2020874
1.x1 1 -0.4924303
1.x2 1 1.0917668
What I would like however is one row per regression, and the variables in the columns. I also need the p values and standard errors included.
DUMMY interceptx1 coeffx1 p-valuex1 SEx1 coeffx2 p-valuex2 SEx2
0 22.84 -11.56 0.04 0.15 2.10 0.80 0.90
1 4.20 -0.49 0.10 0.60 1.09 0.60 1.20
Any idea how to do this?
回答1:
While your desired output is (IMHO) not really tidy data, here is an approach using data.table and a custom-built extraction-function. It has an option to return a wide or long form of the results.
The extractor-function takes in a lm-object, and returns estimates, p-values and standard errors for all variables.
extractor <- function(model, return_wide = F){
#get datatable with coefficient, se and p-value
model_summary <- as.data.table(summary(model)$coefficients[,-3])
model_summary[,variable:=names(coef(model))]
#do some reshaping
step2 <- melt(model_summary, id.var="variable",variable.name="measure")
if(!return_wide){
return(step2)
}
step3 <- dcast(step2, 1~variable+measure,value.var="value")
return(step3)
}
Demonstration:
res_wide <- dat[,extractor(lm(y~x1 + x2), return_wide = T), by = dummy]
> res_wide
# dummy . (Intercept)_Estimate (Intercept)_Std. Error (Intercept)_Pr(>|t|) x1_Estimate x1_Std. Error x1_Pr(>|t|) x2_Estimate x2_Std. Error x2_Pr(>|t|)
# 1: 0 . 0.04314707 0.04495702 0.3376461 -0.054364406 0.04441204 0.2214895 0.01333804 0.04620999 0.7729757
# 2: 1 . -0.04137086 0.04471550 0.3553164 0.009864255 0.04533808 0.8278539 0.05272257 0.04507189 0.2426726
res_long <- dat[,extractor(lm(y~x1 + x2)), by = dummy]
# dummy variable measure value
# 1: 0 (Intercept) Estimate 0.043147072
# 2: 0 x1 Estimate -0.054364406
# 3: 0 x2 Estimate 0.013338043
# 4: 0 (Intercept) Std. Error 0.044957023
# 5: 0 x1 Std. Error 0.044412037
# 6: 0 x2 Std. Error 0.046209987
# 7: 0 (Intercept) Pr(>|t|) 0.337646052
# 8: 0 x1 Pr(>|t|) 0.221489530
Data used:
library(data.table)
set.seed(123)
nobs = 1000
dat <- data.table(
dummy = sample(0:1,nobs,T),
x1 = rnorm(nobs),
x2 = rnorm(nobs),
y = rnorm(nobs))
来源:https://stackoverflow.com/questions/35323389/loop-linear-regression-and-saving-all-coefficients