Grouping very small numbers (e.g. 1e-28) and 0.0 in data.table v1.8.10 vs v1.9.2

后端 未结 1 674
误落风尘
误落风尘 2021-01-13 15:08

I noticed that frequency tables created by data.table in R seem not to distinguish between very small numbers and zero? Can I change this behavior or is this a bug?

相关标签:
1条回答
  • 2021-01-13 15:41

    It is worth reading R FAQ 7.31 and thinking about the accuracy of floating point represenations.

    I can't reproduce this in the current cran version (1.9.2). using

    R version 3.0.3 (2014-03-06)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    

    My guess that the change in behaivour will be related to this news item.

    o Numeric data is still joined and grouped within tolerance as before but instead of tolerance being sqrt(.Machine$double.eps) == 1.490116e-08 (the same as base::all.equal's default) the significand is now rounded to the last 2 bytes, apx 11 s.f. This is more appropriate for large (1.23e20) and small (1.23e-20) numerics and is faster via a simple bit twiddle. A few functions provided a 'tolerance' argument but this wasn't being passed through so has been removed. We aim to add a global option (e.g. 2, 1 or 0 byte rounding) in a future release.


    Update from Matt

    Yes this was a deliberate change in v1.9.2 and data.table now distinguishes 0.0000000000000000000000000001 from 0 (as user3340145 rightly thought it should) due to the improved rounding method highlighted above from NEWS.

    I've also added the for loop test from Rick's answer to the test suite.

    Btw, #5369 is now implemented in v1.9.3 (although neither of these are needed for this question) :

    o bit64::integer64 now works in grouping and joins, #5369. Thanks to James Sams for highlighting UPCs.

    o New function setNumericRounding() may be used to reduce to 1 byte or 0 byte rounding when joining to or grouping columns of type 'numeric', #5369. See example in ?setNumericRounding and NEWS item from v1.9.2. getNumericRounding() returns the current setting.

    Notice that rounding is now (as from v1.9.2) about the accuracy of the significand; i.e. the number of significant figures. 0.0000000000000000000000000001 == 1.0e-28 is accurate to just 1 s.f., so the new rounding method doesn't group this together with 0.0.

    In short, the answer to the question is : upgrade from v1.8.10 to v1.9.2 or greater.

    0 讨论(0)
提交回复
热议问题