How to actually avoid floating point errors when you need to use float?

前端 未结 4 923
攒了一身酷
攒了一身酷 2021-01-13 18:11

I am trying to affect the translation of a 3D model using some UI buttons to shift the position by 0.1 or -0.1.

My model position is a three dimensional float so sim

相关标签:
4条回答
  • 2021-01-13 18:48

    The Kahan summation and pairwise summation algorithms help to reduce floating point errors. Here's some Java code for the Kahan algorithm.

    0 讨论(0)
  • 2021-01-13 18:49

    I would use a Rational class. There are many out there - this one looks like it should work.

    One significant cost will be when the Rational is rendered into a float and one when the denominator is reduced to the gcd. The one I posted keeps the numerator and denominator in fully reduced state at all times which should be quite efficient if you are always adding or subtracting 1/10.

    This implementation holds the values normalised (i.e. with consistent sign) but unreduced.

    You should choose your implementation to best fit your usage.

    0 讨论(0)
  • 2021-01-13 18:49

    A simple solution is to either use fixed precision. i.e. an integer 10x or 100x what you want.

    float f = 10;
    f += 0.1f;
    

    becomes

    int i = 100;
    i += 1;  // use an many times as you like
    // use i / 10.0 as required.
    

    I wouldn't use float in any case as you get more rounding errors than double for next to no benefit (unless you have millions of float values) double gives you 8 more digits of precision and with sensible rounding would won't see those errors.

    0 讨论(0)
  • 2021-01-13 18:49

    If you stick with floats: The easiest way to avoid the error is using floats which are exact, but near the desired value which is

    round(2^n * value) * 1/2^n.

    n is the number of bits, value the number to use (in your case 0.1)

    In your case with increasing precision:

    n = 4 => 0.125
    n = 8 (byte) => 0.9765625
    n = 16 (short)=> 0.100006103516....

    The long number chains are artefacts of the binary conversion, the real number has much less bits.

    As the floats are exact, addition and subtraction will not introduce offset errors, but will always be predictable as long as the number of bits is not longer than the float value holds.

    If you fear that your display will be compromised by using this solution (because they are odd floats), use and store only integers (step increase -1/1). The final value which is internally set is

    x = value * step.

    As the step increases or decreases by an amount of 1, precision will be retained.

    0 讨论(0)
提交回复
热议问题