Merge two regression prediction models (with subsets of a data frame) back into the data frame (one column)

问题

I am building atop a similar question asked and answered on SO one year ago. It relates to this post: how to merge two linear regression prediction models (each per data frame's subset) into one column of the data frame

I will use the same data as was used there, but with a new column. I create the data :

dat = read.table(text = " cats birds    wolfs     snakes     trees
0        3        8         7        2
1        3        8         7        3
1        1        2         3        2
0        1        2         3        1
0        1        2         3        2
1        6        1         1        3
0        6        1         1        1
1        6        1         1        1   " ,header = TRUE)

Model the number of wolves, using two subsets of the data to distinguish between conditions. The equations are different for each subset.

f0 = lm(wolfs~snakes,data = dat,subset=dat$cats==0)
f1 = lm(wolfs~snakes + trees,data = dat,subset=dat$cats==1)

Predict the number of wolves for each subset.

f0_predict = predict(f0,data = dat,subset=dat$cats==1,type='response')
f1_predict = predict(f1,data = dat,subset=dat$cats==0,type='response')

Then (again, per the 2015 post) I split the data by the cats variable.

dat.l = split(dat, dat$cats)
dat.l

... Here is where it gets a little tricky. The 2015 post suggested using lapply to attached the two sets of predictions to the data set. But, here, the respondent's function would not work, as it assumed both regression equations were essentially the same. Here's my attempt (it's close to the original, just tweaked):

dat.l = lapply(dat.l, function(x){
mod = 

ifelse(dat$cats==0,lm(wolfs~snakes,data=x),lm(wolfs~snakes+trees,data=x)) 
               x$full_prediction = predict(mod,data=x,type='response')
               return(x)
    })
    unsplit(dat.l, dat$cats)

Any ideas regarding the last couple of steps? I am still relatively new to S.O., and am an intermediate with R, so please go gently if I have not posted precisely as the community prefers.

回答1:

Here's a dplyr solution, building off of the previous post you cited:

library(dplyr)

# create a new column defining the lm formula for each level of cats
dat <- dat %>% mutate(formula = ifelse(cats==0, "wolfs ~ snakes", 
        "wolfs ~ snakes + trees"))

# build model and find predicted values for each value of cats
dat <- dat %>% group_by(cats) %>%
    do({
        mod <- lm(as.formula(.$formula[1]), data = .)
        pred <- predict(mod)
        data.frame(., pred)
    })

> dat
Source: local data frame [8 x 7]
Groups: cats [2]
   cats birds wolfs snakes trees                formula      pred
  (int) (int) (int)  (int) (int)                  (chr)     (dbl)
1     0     3     8      7     2         wolfs ~ snakes 7.5789474
2     0     1     2      3     1         wolfs ~ snakes 2.6315789
3     0     1     2      3     2         wolfs ~ snakes 2.6315789
4     0     6     1      1     1         wolfs ~ snakes 0.1578947
5     1     3     8      7     3 wolfs ~ snakes + trees 7.6800000
6     1     1     2      3     2 wolfs ~ snakes + trees 2.9600000
7     1     6     1      1     3 wolfs ~ snakes + trees 0.8400000
8     1     6     1      1     1 wolfs ~ snakes + trees 0.5200000

来源：https://stackoverflow.com/questions/38101558/merge-two-regression-prediction-models-with-subsets-of-a-data-frame-back-into

标签

dataframe

subset

prediction