问题
Regrad to this Post, I have created an example to play with linear regression on data.table package as follows:
## rm(list=ls()) # anti-social
library(data.table)
set.seed(1011)
DT = data.table(group=c("b","b","b","a","a","a"),
v1=rnorm(6),v2=rnorm(6), y=rnorm(6))
setkey(DT, group)
ans <- DT[,as.list(coef(lm(y~v1+v2))), by = group]
return,
group (Intercept) v1 v2
1: a 1.374942 -2.151953 -1.355995
2: b -2.292529 3.029726 -9.894993
I am able to obtain the coefficients of the lm
function.
My question is:
How can we directly use predict
to new observations ? If we have the new observations as follows:
new <- data.table(group=c("b","b","b","a","a","a"),v1=rnorm(6),v2=rnorm(6))
I have tried:
setkey(new, group)
DT[,predict(lm(y~v1+v2), new), by = group]
but it returns me strange answers:
group V1
1: a -2.525502
2: a 3.319445
3: a 4.340253
4: a 3.512047
5: a 2.928245
6: a 1.368679
7: b -1.835744
8: b -3.465325
9: b 19.984160
10: b -14.588933
11: b 11.280766
12: b -1.132324
Thank you
回答1:
You are predicting onto the entire new
data set each time. If you want to predict only on the new data for each group you need to subset the "newdata" by group.
This is an instance where .BY
will be useful. Here are two possibilities
a <- DT[,predict(lm(y ~ v1 + v2), new[.BY]), by = group]
b <- new[,predict(lm(y ~ v1 + v2, data = DT[.BY]), newdata=.SD),by = group]
both of which give identical results
identical(a,b)
# [1] TRUE
a
# group V1
#1: a -2.525502
#2: a 3.319445
#3: a 4.340253
#4: b -14.588933
#5: b 11.280766
#6: b -1.132324
来源:https://stackoverflow.com/questions/23947245/use-predict-on-data-table-with-linear-regression