I\'ve read about the difference between double precision and single precision. However, in most cases, float
and double
seem to be interchangeable,
There are three floating point types:
A simple Venn diagram will explain about: The set of values of the types
I just ran into a error that took me forever to figure out and potentially can give you a good example of float precision.
#include <iostream>
#include <iomanip>
int main(){
for(float t=0;t<1;t+=0.01){
std::cout << std::fixed << std::setprecision(6) << t << std::endl;
}
}
The output is
0.000000
0.010000
0.020000
0.030000
0.040000
0.050000
0.060000
0.070000
0.080000
0.090000
0.100000
0.110000
0.120000
0.130000
0.140000
0.150000
0.160000
0.170000
0.180000
0.190000
0.200000
0.210000
0.220000
0.230000
0.240000
0.250000
0.260000
0.270000
0.280000
0.290000
0.300000
0.310000
0.320000
0.330000
0.340000
0.350000
0.360000
0.370000
0.380000
0.390000
0.400000
0.410000
0.420000
0.430000
0.440000
0.450000
0.460000
0.470000
0.480000
0.490000
0.500000
0.510000
0.520000
0.530000
0.540000
0.550000
0.560000
0.570000
0.580000
0.590000
0.600000
0.610000
0.620000
0.630000
0.640000
0.650000
0.660000
0.670000
0.680000
0.690000
0.700000
0.710000
0.720000
0.730000
0.740000
0.750000
0.760000
0.770000
0.780000
0.790000
0.800000
0.810000
0.820000
0.830000
0.839999
0.849999
0.859999
0.869999
0.879999
0.889999
0.899999
0.909999
0.919999
0.929999
0.939999
0.949999
0.959999
0.969999
0.979999
0.989999
0.999999
As you can see after 0.83, the precision runs down significantly.
However, if I set up t
as double, such an issue won't happen.
It took me five hours to realize this minor error, which ruined my program.
Given a quadratic equation: x2 − 4.0000000 x + 3.9999999 = 0, the exact roots to 10 significant digits are, r1 = 2.000316228 and r2 = 1.999683772.
Using float
and double
, we can write a test program:
#include <stdio.h>
#include <math.h>
void dbl_solve(double a, double b, double c)
{
double d = b*b - 4.0*a*c;
double sd = sqrt(d);
double r1 = (-b + sd) / (2.0*a);
double r2 = (-b - sd) / (2.0*a);
printf("%.5f\t%.5f\n", r1, r2);
}
void flt_solve(float a, float b, float c)
{
float d = b*b - 4.0f*a*c;
float sd = sqrtf(d);
float r1 = (-b + sd) / (2.0f*a);
float r2 = (-b - sd) / (2.0f*a);
printf("%.5f\t%.5f\n", r1, r2);
}
int main(void)
{
float fa = 1.0f;
float fb = -4.0000000f;
float fc = 3.9999999f;
double da = 1.0;
double db = -4.0000000;
double dc = 3.9999999;
flt_solve(fa, fb, fc);
dbl_solve(da, db, dc);
return 0;
}
Running the program gives me:
2.00000 2.00000
2.00032 1.99968
Note that the numbers aren't large, but still you get cancellation effects using float
.
(In fact, the above is not the best way of solving quadratic equations using either single- or double-precision floating-point numbers, but the answer remains unchanged even if one uses a more stable method.)
The size of the numbers involved in the float-point calculations is not the most relevant thing. It's the calculation that is being performed that is relevant.
In essence, if you're performing a calculation and the result is an irrational number or recurring decimal, then there will be rounding errors when that number is squashed into the finite size data structure you're using. Since double is twice the size of float then the rounding error will be a lot smaller.
The tests may specifically use numbers which would cause this kind of error and therefore tested that you'd used the appropriate type in your code.
Here is what the standard C99 (ISO-IEC 9899 6.2.5 §10) or C++2003 (ISO-IEC 14882-2003 3.1.9 §8) standards say:
There are three floating point types:
float
,double
, andlong double
. The typedouble
provides at least as much precision asfloat
, and the typelong double
provides at least as much precision asdouble
. The set of values of the typefloat
is a subset of the set of values of the typedouble
; the set of values of the typedouble
is a subset of the set of values of the typelong double
.
The C++ standard adds:
The value representation of floating-point types is implementation-defined.
I would suggest having a look at the excellent What Every Computer Scientist Should Know About Floating-Point Arithmetic that covers the IEEE floating-point standard in depth. You'll learn about the representation details and you'll realize there is a tradeoff between magnitude and precision. The precision of the floating point representation increases as the magnitude decreases, hence floating point numbers between -1 and 1 are those with the most precision.
Floats have less precision than doubles. Although you already know, read What WE Should Know About Floating-Point Arithmetic for better understanding.