Grouping very small numbers (e.g. 1e-28) and 0.0 in data.table v1.8.10 vs v1.9.2

后端未结

关注

 1  675

I noticed that frequency tables created by data.table in R seem not to distinguish between very small numbers and zero? Can I change this behavior or is this a bug?

相关标签:

1条回答

自闭症患者

2021-01-13 15:41
It is worth reading R FAQ 7.31 and thinking about the accuracy of floating point represenations.

I can't reproduce this in the current cran version (1.9.2). using
```
R version 3.0.3 (2014-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
```
My guess that the change in behaivour will be related to this news item.

o Numeric data is still joined and grouped within tolerance as before but instead of tolerance being sqrt(.Machine$double.eps) == 1.490116e-08 (the same as base::all.equal's default) the significand is now rounded to the last 2 bytes, apx 11 s.f. This is more appropriate for large (1.23e20) and small (1.23e-20) numerics and is faster via a simple bit twiddle. A few functions provided a 'tolerance' argument but this wasn't being passed through so has been removed. We aim to add a global option (e.g. 2, 1 or 0 byte rounding) in a future release.

Update from Matt

Yes this was a deliberate change in v1.9.2 and data.table now distinguishes 0.0000000000000000000000000001 from 0 (as user3340145 rightly thought it should) due to the improved rounding method highlighted above from NEWS.

I've also added the for loop test from Rick's answer to the test suite.

Btw, #5369 is now implemented in v1.9.3 (although neither of these are needed for this question) :

o bit64::integer64 now works in grouping and joins, #5369. Thanks to James Sams for highlighting UPCs.

o New function setNumericRounding() may be used to reduce to 1 byte or 0 byte rounding when joining to or grouping columns of type 'numeric', #5369. See example in ?setNumericRounding and NEWS item from v1.9.2. getNumericRounding() returns the current setting.

Notice that rounding is now (as from v1.9.2) about the accuracy of the significand; i.e. the number of significant figures. 0.0000000000000000000000000001 == 1.0e-28 is accurate to just 1 s.f., so the new rounding method doesn't group this together with 0.0.

In short, the answer to the question is : upgrade from v1.8.10 to v1.9.2 or greater.
0 讨论(0)
发布评论:

提交评论
- 加载中...