For example, these variables:
result (double)
a (double)
b (float)
c (float)
d (double)
A simple calculation:
result = a *
You have parenthesis delimiting the float adition. So it would do b + c as float + float. Convert this to double for keeping largest precision, then multiply the double values.
However in such case where you want control over conversions, and not guessing:
use static_cast<>()
;
The floats will be upconverted to doubles. Explicitly cast the values.
ie if you want double as your result you would write:
result = a * double( b + c ) * d;
It is ALWAYS worth being explicit. It gets round misunderstandings like this and it is INSTANTLY obvious to anyone trying to use your code exactly what you mean.
Following order of operations, each sub-expression is converted to the type of it's (not sure of the term here, dominant perhaps?) type. double is dominant over float, so:
(b + c) // this is evaluated as a float, since both b and c are floats
a * (b + c) // this is evaluated as a double, since a is a double
a * (b + c) * d // this is evaluated as a double, since both "a * (b + c)" and d are doubles
In your example, all the float
types are type-promoted to double
when the right-side formula is evaluated.
As for how they're converted: What I've read regarding floating point operations is that most contemporary hardware perform FP operations using extended-precision (80 bit) long doubles in special hardware registers (at least that's what I remember about modern Intel x86/x87 processors). As I understand it, float
and double
are type-promoted IN HARDWARE via special FP instructions (someone correct me if I'm wrong).
If you have:
float f;
double d;
...then an arithmetic expression like f * d
will promote both operands to the larger type, which in this case is double
.
So, the expression a * (b + c) * d
evaluates to a double
, and is then stored in result
, which is also a double
. This type promotion is done in order to avoid accidental precision loss.
For further information, read this article about the usual arithmetic conversions.
You have to differentiate between type conversion and value conversion. The C++ standard (C as well) allows floating-point calculations to be done at extended precision.
"The values of the floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby."
As types, b + c is an addition of two float(s). The result is a float. The result is then type promoted to a double and the two multiplications are done as doubles with a result of double.
However, an implementation is allowed to do all the calculations, including b + c, using doubles (or higher precision). Indeed, I tried it out using Visual C++ and it did all the calculations using the 80-bit floating-point stack available on x86.