how to merge two linear regression prediction models (each per data frame's subset) into one column of the data frame

问题

I would like to build 2 linear regression models that are based on 2 subsets of the dataset and then to have one column that contains the prediction values per each subset. Here is my data frame example :

dat <- read.table(text = " cats birds    wolfs     snakes
 0        3        8         7
 1        3        8         7
 1        1        2         3
 0        1        2         3
 0        1        2         3
 1        6        1         1
 0        6        1         1
 1        6        1         1   ",header = TRUE)

First I have built two models:

# one is for wolfs ~ snakes where cats=0
f0<-lm(wolfs~snakes,data=dat,subset=dat$cats==0)

#the second model is for wolfs ~ snakes where cats=1
f1<-lm(wolfs~snakes,data=dat,subset=dat$cats==1)

I then did the prediction per each model:

f0_predict<-predict(f0,data=dat,subset=dat$cats==1,type='response')
f1_predict<-predict(f1,data=dat,subset=dat$cats==0,type='response')

This works fine, but I can't find a way to insert it back to the original data frame in such a way that if cats==0 I'll get the prediction value of the model for rows where cats==0 and if cat==1 I'll get the prediction value of the model for rows where cats==1 in the same column named: full_prediction. for example the output should be (with Pseudo prediction values) :

  cats   birds    wolfs     snakes full_prediction
     0        3        8         7        0.6
     1        3        8         7        0.5
     1        1        2         3        0.4
     0        1        2         3        0.3
     0        1        2         3        0.3
     1        6        1         1        0.7
     0        6        1         1        0.1
     1        6        1         1        0.7

If you look at rows number 6-8 you can see that the value of the full_prediction is 0.7 for cats==1 and 0.1 for cats==0 Any Idea how to do such a thing?

回答1:

Use splitand unsplit

dat.l <- split(dat, dat$cats)

dat.l <- lapply(dat.l, function(x){
  mod <- lm(wolfs~snakes,data=x)
  x$full_prediction <- predict(mod,data=x,type='response')
  return(x)
})

unsplit(dat.l, dat$cats)

Output:

cats birds wolfs snakes full_prediction
1    0     3     8      7       7.5789474
2    1     3     8      7       7.6666667
3    1     1     2      3       3.0000000
4    0     1     2      3       2.6315789
5    0     1     2      3       2.6315789
6    1     6     1      1       0.6666667
7    0     6     1      1       0.1578947
8    1     6     1      1       0.6666667

A dplyr solution would be:

require(dplyr)
dat %>% 
  group_by(cats) %>%
  do({
    mod <- lm(wolfs~snakes, data = .)
    pred <- predict(mod)
    data.frame(., pred)
  })

来源：https://stackoverflow.com/questions/24881923/how-to-merge-two-linear-regression-prediction-models-each-per-data-frames-subs

标签

dataframe

linear-regression

prediction