Is it possible to get 0 by subtracting two unequal floating point numbers?

后端 未结 12 1812
余生分开走
余生分开走 2021-01-30 07:50

Is it possible to get division by 0 (or infinity) in the following example?

public double calculation(double a, double          


        
相关标签:
12条回答
  • 2021-01-30 08:27

    I can think of a case where you might be able to cause this to happen. Here's an analogous sample in base 10 - really, this would happen in base 2, of course.

    Floating point numbers are stored more or less in scientific notation - that is, instead of seeing 35.2, the number being stored would be more like 3.52e2.

    Imagine for the sake of convenience that we have a floating point unit that operates in base 10 and has 3 digits of accuracy. What happens when you subtract 9.99 from 10.0?

    1.00e2-9.99e1

    Shift to give each value the same exponent

    1.00e2-0.999e2

    Round to 3 digits

    1.00e2-1.00e2

    Uh oh!

    Whether this can happen ultimately depends on the FPU design. Since the range of exponents for a double is very large, the hardware has to round internally at some point, but in the case above, just 1 extra digit internally will prevent any problem.

    0 讨论(0)
  • 2021-01-30 08:30

    As a workaround, what about the following?

    public double calculation(double a, double b) {
         double c = a - b;
         if (c == 0)
         {
             return 0;
         }
         else
         {
             return 2 / c;
         }
    }
    

    That way you don't depend on IEEE support in any language.

    0 讨论(0)
  • 2021-01-30 08:30

    Based on @malarres response and @Taemyr comment, here is my little contribution:

    public double calculation(double a, double b)
    {
         double c = 2 / (a - b);
    
         // Should not have a big cost.
         if (isnan(c) || isinf(c))
         {
             return 0; // A 'whatever' value.
         }
         else
         {
             return c;
         }
    }
    

    My point is to says: the easyest way to know if the result of the division is nan or inf is actualy to perform the division.

    0 讨论(0)
  • 2021-01-30 08:31

    You shouldn't ever compare floats or doubles for equality; because, you can't really guarantee that the number you assign to the float or double is exact.

    To compare floats for equality sanely, you need to check if the value is "close enough" to the same value:

    if ((first >= second - error) || (first <= second + error)
    
    0 讨论(0)
  • 2021-01-30 08:33

    In olden times before IEEE 754, it was quite possible that a != b didn't imply a-b != 0 and vice versa. That was one of the reasons to create IEEE 754 in the first place.

    With IEEE 754 it is almost guaranteed. C or C++ compilers are allowed to do an operation with higher precision than needed. So if a and b are not variables but expressions, then (a + b) != c doesn't imply (a + b) - c != 0, because a + b could be calculated once with higher precision, and once without higher precision.

    Many FPUs can be switched to a mode where they don't return denormalized numbers but replace them with 0. In that mode, if a and b are tiny normalised numbers where the difference is smaller than the smallest normalised number but greater than 0, a != b also doesn't guarantee a == b.

    "Never compare floating-point numbers" is cargo cult programming. Among the people who have the mantra "you need an epsilon", most have no idea how to choose that epsilon properly.

    0 讨论(0)
  • 2021-01-30 08:36

    The core problem is that computer representation of a double (aka float, or real number in mathematical language) is wrong when you have "too much" decimal, for instance when you deal with double that can't be written as a numerical value (pi or the result of 1/3).

    So a==b can't be done with any double value of a and b, how to you deal with a==b when a=0.333 and b=1/3 ? Depending of your OS vs FPU vs number vs language versus count of 3 after 0, you will have true or false.

    Anyway if you do "double value calculation" on a computer, you have to deal with accuracy, so instead of doing a==b, you have to do absolute_value(a-b)<epsilon, and epsilon is relative to what you are modeling at that time in your algorithm. You can't have an epsilon value for all of your double comparison.

    In brief, when you type a==b, you have a mathemical expression that can't be translated on a computer (for any floating point number).

    PS: hum, everything I answer here is yet more or less in others responses and comments.

    0 讨论(0)
提交回复
热议问题