Merge two regression prediction models (with subsets of a data frame) back into the data frame (one column)

拈花ヽ惹草 提交于 2020-01-06 17:46:51

问题


I am building atop a similar question asked and answered on SO one year ago. It relates to this post: how to merge two linear regression prediction models (each per data frame's subset) into one column of the data frame

I will use the same data as was used there, but with a new column. I create the data :

dat = read.table(text = " cats birds    wolfs     snakes     trees
0        3        8         7        2
1        3        8         7        3
1        1        2         3        2
0        1        2         3        1
0        1        2         3        2
1        6        1         1        3
0        6        1         1        1
1        6        1         1        1   " ,header = TRUE) 

Model the number of wolves, using two subsets of the data to distinguish between conditions. The equations are different for each subset.

f0 = lm(wolfs~snakes,data = dat,subset=dat$cats==0)
f1 = lm(wolfs~snakes + trees,data = dat,subset=dat$cats==1)

Predict the number of wolves for each subset.

f0_predict = predict(f0,data = dat,subset=dat$cats==1,type='response')
f1_predict = predict(f1,data = dat,subset=dat$cats==0,type='response')

Then (again, per the 2015 post) I split the data by the cats variable.

dat.l = split(dat, dat$cats)
dat.l 

... Here is where it gets a little tricky. The 2015 post suggested using lapply to attached the two sets of predictions to the data set. But, here, the respondent's function would not work, as it assumed both regression equations were essentially the same. Here's my attempt (it's close to the original, just tweaked):

dat.l = lapply(dat.l, function(x){
mod = 

ifelse(dat$cats==0,lm(wolfs~snakes,data=x),lm(wolfs~snakes+trees,data=x)) 
               x$full_prediction = predict(mod,data=x,type='response')
               return(x)
    })
    unsplit(dat.l, dat$cats) 

Any ideas regarding the last couple of steps? I am still relatively new to S.O., and am an intermediate with R, so please go gently if I have not posted precisely as the community prefers.


回答1:


Here's a dplyr solution, building off of the previous post you cited:

library(dplyr)

# create a new column defining the lm formula for each level of cats
dat <- dat %>% mutate(formula = ifelse(cats==0, "wolfs ~ snakes", 
        "wolfs ~ snakes + trees"))

# build model and find predicted values for each value of cats
dat <- dat %>% group_by(cats) %>%
    do({
        mod <- lm(as.formula(.$formula[1]), data = .)
        pred <- predict(mod)
        data.frame(., pred)
    })

> dat
Source: local data frame [8 x 7]
Groups: cats [2]
   cats birds wolfs snakes trees                formula      pred
  (int) (int) (int)  (int) (int)                  (chr)     (dbl)
1     0     3     8      7     2         wolfs ~ snakes 7.5789474
2     0     1     2      3     1         wolfs ~ snakes 2.6315789
3     0     1     2      3     2         wolfs ~ snakes 2.6315789
4     0     6     1      1     1         wolfs ~ snakes 0.1578947
5     1     3     8      7     3 wolfs ~ snakes + trees 7.6800000
6     1     1     2      3     2 wolfs ~ snakes + trees 2.9600000
7     1     6     1      1     3 wolfs ~ snakes + trees 0.8400000
8     1     6     1      1     1 wolfs ~ snakes + trees 0.5200000


来源:https://stackoverflow.com/questions/38101558/merge-two-regression-prediction-models-with-subsets-of-a-data-frame-back-into

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!