regression on subsets for unique factor combinations using lm

我怕爱的太早我们不能终老 提交于 2019-12-06 00:33:46
Christoph_J

You could use the plyr package:

require(plyr)
list_reg <- dlply(df1, .(Surface, Supplier, ParticleSize, T1, T2), function(df) 
  {lm(Shear~Gap+Clearance+Void,data=df)})
#We have indeed five different results
length(list_reg)
#That's how you check out one particular regression, in this case the first
summary(list_reg[[1]])

The function dlply takes a data.frame (that's what the d... stands for), in your case df1, and returns a list (that's what the .l... stands for), in your case consisting of five elements, each containing the results of one regression.

Internally, your df1 is split up into five sub-data.frames according to the columns specified by .(Surface, Supplier, ParticleSize, T1, T2) and the function lm(Shear~Gap+Clearance+Void,data=df) is applied to every of these sub-data.frames.

To get a better feeling of what dlply really does, just call

list_sub_df <- dlply(df1, .(Surface, Supplier, ParticleSize, T1, T2))

and you can look at each sub-data.frame on which the lm will be applied to.

And just a general note at the end: The paper by the package author Hadley Wickham is really great: even if you won't end up using his package, it is still really good to get a feeling about the split-apply-combine approach.

EDIT:

I just did a quick search and as expected, this was already explained better before, so also make sure to read this SO post.

EDIT2:

If you want to use the column numbers directly, try this (taken from this SO post):

 list_reg <- dlply(df1, names(df1[, 1:5]), function(df) 
      {lm(Shear~Gap+Clearance+Void,data=df)})
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!