For clarity, if I\'m using a language that implements IEE 754 floats and I declare:
float f0 = 0.f;
float f1 = 1.f;
...and then print them
The largest value representable by an n bit integer is 2n-1. As noted above, a float
has 24 bits of precision in the significand which would seem to imply that 224 wouldn't fit.
However.
Powers of 2 within the range of the exponent are exactly representable as 1.0×2n, so 224 can fit and consequently the first unrepresentable integer for float
is 224+1. As noted above. Again.
2mantissa bits + 1 + 1
The +1 in the exponent (mantissa bits + 1) is because, if the mantissa contains abcdef...
the number it represents is actually 1.abcdef... × 2^e
, providing an extra implicit bit of precision.
Therefore, the first integer that cannot be accurately represented and will be rounded is:
For float
, 16,777,217 (224 + 1).
For double
, 9,007,199,254,740,993 (253 + 1).
>>> 9007199254740993.0
9007199254740992