I have a large dataset with several variables, one of which is a state variable, coded 1-50 for each state. I\'d like to run a regression of 28 variables on the remaining 2
This is another example of the classic Split-Apply-Combine
problem, which can be addressed using the plyr
package by @hadley. In your problem, you want to
I will illustrate it with the Cars93
dataset available in MASS
library. We are interested in figuring out the relationship between horsepower
and enginesize
based on origin
of country.
# LOAD LIBRARIES
require(MASS); require(plyr)
# SPLIT-APPLY-COMBINE
regressions <- dlply(Cars93, .(Origin), lm, formula = Horsepower ~ EngineSize)
coefs <- ldply(regressions, coef)
Origin (Intercept) EngineSize
1 USA 33.13666 37.29919
2 non-USA 15.68747 55.39211
EDIT. For your example, substitute PUF
for Cars93
, state
for Origin
and fm
for the formula
I've cleaned up your code slightly:
fm <- z ~ class1+class2+class3+class4+class5+class6+class7+
xtot+e00200+e00300+e00600+e00900+e01000+p04470+e04800+
e09600+e07180+e07220+e07260+e06500+e10300+
e59720+e11900+e18425+e18450+e18500+e19700
PUFsplit <- split(PUF, PUF$state)
mod <- lapply(PUFsplit, function(z) lm(fm, data=z))
Beta <- sapply(mod, coef)
If you wanted, you could even put this all in one line:
Beta <- sapply(lapply(split(PUF, PUF$state), function(z) lm(fm, data=z)), coef)