How can I increase precision in R when calculating with probabilities close to 0 and 1?

前端 未结 2 766
鱼传尺愫
鱼传尺愫 2021-01-12 23:34

I have the problem that a sum of variables representing probabilities in the [0,1] interval sum to 0 but should be >0. The problem

相关标签:
2条回答
  • 2021-01-12 23:45

    You want to avoid catastrophic cancellation, which occurs when subtracting values of approximately the same size: essentially this amplifies any existing errors.

    One trick is to do the subtraction first, i.e.

    > p1 = 0.8
    > p2 = 0.9999999999999998
    > (1-p1)*(1-p2)
    [1] 4.440892e-17
    

    The remaining inaccuracy is due to the fact that 0.9999999999999998 isn't actually 0.9999999999999998, but instead is stored as 0.9999999999999997779553950749686919152736663818359375 (which is the closest representable floating point number)

    Update: to explain a bit further what is going on.

    Since most floating point operations cannot be done exactly, they incur some error: this is the difference between the result you get and the true mathematical result. It is inherently relative, rather than absolute (i.e. significant digits, not digits after the decimal place).

    When thinking about error, you need to keep track of 2 things:

    1. the error from the operation
    2. the already accumulated error.

    The first is not too difficult: most operations incur a small amount of relative error that is at most about 1e-16 (half a unit in the last place. There are also a couple of operations that incur no error (subtracting 2 numbers that are within 2x of each other, multilying or dividing by powers of 2).

    The second is trickier, but basically want to avoid anything that would amplify existing error: the most insidious is subtracting two numbers that are roughly equal: the absolute error stays the same, but the relative error can increase dramatically. Multilication and division are inherently relative operations, so don't have any effect here.

    In your original code, you are doing a lot of such subtractions with quantities that have already accumulated some error. In my modification, I do the subtractions first, so this error is minimised (though in the case of p2, you can see that the result is still drastically influenced by the error in conversion from the decimal string "0.9999999999999998" to the binary floating point number).

    0 讨论(0)
  • 2021-01-13 00:03

    R itself can only deal with 64 bit floating point numbers, the Rmpfr package can deal with arbitrary precision floating point numbers:

    library(Rmpfr)
    
    > p1 = mpfr("0.8", 128)
    > p2 = mpfr("0.9999999999999998", 128)
    
    > p11 = p1 * p2
    > p10 = p1 - p11
    > p01 = p2 - p11
    > p00 = 1 - p11 - p10 - p01
    
    > p00
    1 'mpfr' number of precision  128   bits 
    [1] 4.00000000000000000000000461461738779728e-17
    

    Edit: Use stings to define mpfr

    0 讨论(0)
提交回复
热议问题