Use Predict on data.table with Linear Regression

一笑奈何 提交于 2019-12-07 05:19:41

问题


Regrad to this Post, I have created an example to play with linear regression on data.table package as follows:

## rm(list=ls()) # anti-social
library(data.table)
set.seed(1011)
DT = data.table(group=c("b","b","b","a","a","a"),
                v1=rnorm(6),v2=rnorm(6), y=rnorm(6))
setkey(DT, group)
ans <- DT[,as.list(coef(lm(y~v1+v2))), by = group]

return,

   group (Intercept)        v1        v2
1:     a    1.374942 -2.151953 -1.355995
2:     b   -2.292529  3.029726 -9.894993

I am able to obtain the coefficients of the lm function.

My question is: How can we directly use predict to new observations ? If we have the new observations as follows:

new <- data.table(group=c("b","b","b","a","a","a"),v1=rnorm(6),v2=rnorm(6))

I have tried:

setkey(new, group)
DT[,predict(lm(y~v1+v2), new), by = group]

but it returns me strange answers:

    group         V1
 1:     a  -2.525502
 2:     a   3.319445
 3:     a   4.340253
 4:     a   3.512047
 5:     a   2.928245
 6:     a   1.368679
 7:     b  -1.835744
 8:     b  -3.465325
 9:     b  19.984160
10:     b -14.588933
11:     b  11.280766
12:     b  -1.132324

Thank you


回答1:


You are predicting onto the entire new data set each time. If you want to predict only on the new data for each group you need to subset the "newdata" by group.

This is an instance where .BY will be useful. Here are two possibilities

a <- DT[,predict(lm(y ~ v1 + v2), new[.BY]), by = group]

b <- new[,predict(lm(y ~ v1 + v2, data = DT[.BY]), newdata=.SD),by = group]

both of which give identical results

identical(a,b)
# [1] TRUE
a
#   group         V1
#1:     a  -2.525502
#2:     a   3.319445
#3:     a   4.340253
#4:     b -14.588933
#5:     b  11.280766
#6:     b  -1.132324


来源:https://stackoverflow.com/questions/23947245/use-predict-on-data-table-with-linear-regression

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!