I noticed that frequency tables created by data.table in R seem not to distinguish between very small numbers and zero? Can I change this behavior or is this a bug?
It is worth reading R FAQ 7.31 and thinking about the accuracy of floating point represenations.
I can't reproduce this in the current cran version (1.9.2). using
R version 3.0.3 (2014-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
My guess that the change in behaivour will be related to this news item.
o Numeric data is still joined and grouped within tolerance as before but instead of tolerance being sqrt(.Machine$double.eps) == 1.490116e-08 (the same as base::all.equal's default) the significand is now rounded to the last 2 bytes, apx 11 s.f. This is more appropriate for large (1.23e20) and small (1.23e-20) numerics and is faster via a simple bit twiddle. A few functions provided a 'tolerance' argument but this wasn't being passed through so has been removed. We aim to add a global option (e.g. 2, 1 or 0 byte rounding) in a future release.
Update from Matt
Yes this was a deliberate change in v1.9.2 and data.table
now distinguishes 0.0000000000000000000000000001
from 0
(as user3340145 rightly thought it should) due to the improved rounding method highlighted above from NEWS.
I've also added the for
loop test from Rick's answer to the test suite.
Btw, #5369 is now implemented in v1.9.3 (although neither of these are needed for this question) :
o bit64::integer64 now works in grouping and joins, #5369. Thanks to James Sams for highlighting UPCs.
o New function setNumericRounding() may be used to reduce to 1 byte or 0 byte rounding when joining to or grouping columns of type 'numeric', #5369. See example in ?setNumericRounding and NEWS item from v1.9.2. getNumericRounding() returns the current setting.
Notice that rounding is now (as from v1.9.2) about the accuracy of the significand; i.e. the number of significant figures. 0.0000000000000000000000000001 == 1.0e-28
is accurate to just 1 s.f., so the new rounding method doesn't group this together with 0.0
.
In short, the answer to the question is : upgrade from v1.8.10 to v1.9.2 or greater.