Unexpected loss of precision when dividing doubles

前端 未结 8 1967
慢半拍i
慢半拍i 2021-02-09 14:33

I have a function getSlope which takes as parameters 4 doubles and returns another double calculated using this given parameters in the following way:

double QSw         


        
8条回答
  •  滥情空心
    2021-02-09 15:08

    The results you are getting are consistent with 32bit arithmetic. Without knowing more about your environment, it's not possible to advise what to do.

    Assuming the code shown is what's running, ie you're not converting anything to strings or floats, then there isn't a fix within C++. It's outside of the code you've shown, and depends on the environment.

    As Patrick McDonald and Treb brought both up the accuracy of your inputs and the error on a-c, I thought I'd take a look at that. One technique to look at rounding errors is interval arithmetic, which makes the upper and lower bounds which value represents explicit (they are implicit in floating point numbers, and are fixed to the precision of the representation). By treating each value as an upper and lower bound, and by extending the bounds by the error in the representation ( approx x * 2 ^ -53 for a double value x ), you get a result which gives the lower and upper bounds on the accuracy of a value, taking into account worst case precision errors.

    For example, if you have a value in the range [1.0, 2.0] and subtract from it a value in the range [0.0, 1.0], then the result must lie in the range [below(0.0),above(2.0)] as the minimum result is 1.0-1.0 and the maximum is 2.0-0.0. below and above are equivalent to floor and ceiling, but for the next representable value rather than for integers.

    Using intervals which represent worst-case double rounding:

    getSlope(
     a = [2.7115599999999995262:2.7115600000000004144], 
     b = [-1.6416099999999997916:-1.6416100000000002357], 
     c = [2.7041299999999997006:2.7041300000000005888], 
     d = [-1.7221899999999998876:-1.7221900000000003317])
    (d-b) = [-0.080580000000000526206:-0.080579999999999665783]
    (c-a) = [-0.0074300000000007129439:-0.0074299999999989383218]
    
    to double precision [10.845222072677243474:10.845222072679954195]
    

    So although c-a is small compared to c or a, it is still large compared to double rounding, so if you were using the worst imaginable double precision rounding, then you could trust that value's to be precise to 12 figures - 10.8452220727. You've lost a few figures off double precision, but you're still working to more than your input's significance.

    But if the inputs were only accurate to the number significant figures, then rather than being the double value 2.71156 +/- eps, then the input range would be [2.711555,2.711565], so you get the result:

    getSlope(
     a = [2.711555:2.711565], 
     b = [-1.641615:-1.641605], 
     c = [2.704125:2.704135], 
     d = [-1.722195:-1.722185])
    (d-b) = [-0.08059:-0.08057]
    (c-a) = [-0.00744:-0.00742]
    
    to specified accuracy [10.82930108:10.86118598]
    

    which is a much wider range.

    But you would have to go out of your way to track the accuracy in the calculations, and the rounding errors inherent in floating point are not significant in this example - it's precise to 12 figures with the worst case double precision rounding.

    On the other hand, if your inputs are only known to 6 figures, it doesn't actually matter whether you get 10.8557 or 10.8452. Both are within [10.82930108:10.86118598].

提交回复
热议问题