data.table 1.8.x mean() function auto removing NA?

问题

Today I found out a bug in my program due to data.table auto remove NA for mean

for example:

> a<-data.table(a=c(NA,NA,FALSE,FALSE), b=c(1,1,2,2))
> a

> a[,list(mean(a), sum(a)),by=b]
   b V1 V2
1: 1  0 NA // Why V1 = 0 here? I had expected NA
2: 2  0  0


> mean(c(NA,NA,FALSE,FALSE))
[1] NA
> mean(c(NA,NA))
[1] NA
> mean(c(FALSE,FALSE))
[1] 0

Is this the intended behaviour?

回答1:

This isn't intended. Looks like a problem with optimization ...

> a[,list(mean(a), sum(a)),by=b]
   b V1 V2
1: 1  0 NA
2: 2  0  0
> options(datatable.optimize=FALSE)
> a[,list(mean(a), sum(a)),by=b]
   b V1 V2
1: 1 NA NA
2: 2  0  0
>

Investigated and fixed in v1.8.9, soon to be on CRAN. From NEWS :

mean() in j has been optimized since v1.8.2 but wasn't respecting na.rm=TRUE (the default). Many thanks to Colin Fang for reporting. Test added.

The new feature in v1.8.2 was :

mean() is now automatically optimized, #1231. This can speed up grouping by 20 times when there are a large number of groups. See wiki point 3, which is no longer needed to know. Turn off optimization by setting options(datatable.optimize=0).

来源：https://stackoverflow.com/questions/18571774/data-table-1-8-x-mean-function-auto-removing-na

标签

data.table

mean

na.rm

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!