Properly testing two floating-point numbers for equality is something that a lot of people, including me, don\'t fully understand. Today, however, I thought about how some s
When using floating-point numbers, the relational operators have meanings, but their meanings don't necessarily align with how actual numbers behave.
If floating-point values are used to represent actual numbers (their normal purpose), the operators tend to behave as follows:
x > y
and x >= y
both imply that the numeric quantity which x
is supposed to represent is likely greater than y
, and at worst probably not much less than y
.
x < y
and x <= y
both imply that the numeric quantity which x
is supposed to represent is likely less than than y
, and is at worst probably not much greater than y
.
x == y
implies that the numeric quantities which x
and y
represent are indistinguishable from each other
Note that if x
is of type float
, and y
is of type double
, the above meanings will be achieved if the double
argument is cast to float
. In the absence of a specific cast, however, C and C++ (and also many other languages) will convert a float
operand to double
before performing a comparison. Such conversion will greatly reduce the likelihood that the operands will be reported "indistinguishable", but will greatly increase the likelihood that the comparison will yield a result contrary to what the intended numbers actually indicate. Consider, for example,
float f = 16777217;
double d = 16777216.5;
If both operands are cast to float
, the comparison will indicate that the values are indistinguishable. If they are cast to double
, the comparison will indicate that d
is larger even though the value f
is supposed to represent is slightly bigger. As a more extreme example:
float f = 1E20f;
float f2 = f*f;
double d = 1E150;
double d2 = d*d;
Float f2
contains the best float
representation of 1E40. Double d2
contains the best double
representation of 1E400. The numerical quantity represented by d2 is hundreds of orders of magnitude greater than that represented by
f2, but
(double)f2 > d2. By contrast, converting both operands to float would yield
f2 == (float)d2`, correctly reporting that the values are indistinguishable.
PS--I am well aware that IEEE standards require that calculations be performed as though floating-point values represent precise power-of-two fractions, but few people seeing the code float f2 = f1 / 10.0;
as being "Set f2 to the representable power-of-two fraction which is closest to being one tenth of the one in f1". The purpose of the code is to make f2 be a tenth of f1. Because of imprecision, the code cannot fulfill that purpose perfectly, but in most cases it's more helpful to regard floating-point numbers as representing actual numerical quantities than to regard them as power-of-two fractions.