I have read multiple articles regarding floating point variables comparison, but failed to understand and get the required knowledge from those articles. So, here I am posting t
The output: same although both the float variables hold different values.
"float variables hold different values." is unfounded.
same
was printed because values a,b
are the same even if the initialization constants differ.
Typical float is 32-bits and can represent about 232 different values such as 1.0, 1024.0, 0.5, 0.125. These values are all of the form: +/- some_integer*2some_integer
1.012345679
and 1.012345678
are not in that float
set. @Rudy Velthuis.
1.012345 67165374755859375 // `float` member
1.012345 678
1.012345 679
1.012345 790863037109375 // `float` member
Similar applies for double
, yet with more precision - commonly 64 bits.
1.012345679
and 1.012345678
are not in that double
set either
1.012345 67799999997106397131574340164661407470703125 // `double` member
1.012345 678
1.012345 6780000001931085762407747097313404083251953125 // `double` member
...
1.012345 6789999998317597373898024670779705047607421875 // `double` member
1.012345 679
1.012345 67900000005380434231483377516269683837890625 // `double` member
It can be thought of as 2 steps of rounding. Code 1.012345679
is rounded to the nearest double
1.01234567900000005380434231483377516269683837890625. Then the assignment rounds the double
to the nearest float
1.01234567165374755859375.
float a = 1.012345679;
// 'a' has the value of 1.01234567165374755859375
Likewise for b
. Code 1.012345678
is rounded to the nearest double
1.01234567799999997106397131574340164661407470703125. Then the assignment rounds the double
to the nearest float
1.01234567165374755859375.
flaot b = 1.012345678;
// 'b' has the value of 1.01234567165374755859375
a
and b
have the same value.
There is no general solution for comparing floating-point numbers that contain errors from previous operations. The code that must be used is application-specific. So, to get a proper answer, you must describe your situation more specifically.
The underlying problem is that performing a correct computation using incorrect data is in general impossible. If you want to compute some function of two exact mathematical values x and y but the only data you have is some inexactly computed values x
and y
, it is generally impossible to compute the exactly correct result. For example, suppose you want to know what the sum, x+y, is, but you only know x
is 3 and y
is 4, but you do not know what the true, exact x and y are. Then you cannot compute x+y.
If you know that x
and y
are approximately x and y, then you can compute an approximation of x+y by adding x
and y
. The works when the function being computed (+
in this example) has a reasonable derivative: Slightly changing the inputs of a function with a reasonable derivative slightly changes its outputs. This fails when the function you want to compute has a discontinuity or a large derivative. For example, if you want to compute the square root of x (in the real domain) using an approximation x
but x
might be negative due to previous rounding errors, then computing sqrt(x)
may produce an exception. Similarly, comparing for inequality or order is a discontinuous function: A slight change in inputs can change the answer completely (from false to true or vice-versa).
The common bad advice is to compare with a “tolerance”. This method trades false negatives (incorrect rejections of numbers that would satisfy the comparison if the exact mathematical values were compared) for false positives (incorrect acceptance of numbers that would not satisfy the comparison).
Whether or not an applicable can tolerate false acceptance depends on the application. Therefore, there is no general solution.
The level of tolerance to set, and even the nature by which it is calculated, depend on the data, the errors, and the previous calculations. So, even when it is acceptable to compare with a tolerance, the amount of tolerance to use and how to calculate it depend on the application. There is no general solution.
It's because floats have 7 digit precision. If you want better precision you need to use double or long double.