Why do compilers fix the digits of floating point number to 6?

问题

According to The C++ Programming Language - 4th, section 6.2.5:

There are three floating-points types: float (single-precision), double (double-precision), and long double (extended-precision)

Refer to: http://en.wikipedia.org/wiki/Single-precision_floating-point_format

The true significand includes 23 fraction bits to the right of the binary point and an implicit leading bit (to the left of the binary point) with value 1 unless the exponent is stored with all zeros. Thus only 23 fraction bits of the significand appear in the memory format but the total precision is 24 bits (equivalent to log10(224) ≈ 7.225 decimal digits).

→ The maximum digits of floating point number is 7 digits on binary32 interchange format. (a computer number format that occupies 4 bytes (32 bits) in computer memory)

When I test on different compilers (like GCC, VC compiler)
→ It always outputs 6 as the value.

Take a look into float.h of each compiler
→ I found that 6 is fixed.

Question:

Do you know why there is a different here (between actual value theoretical value - 7 - and actual value - 6)?
It sounds like "7" is more reasonable because when I test using below code, the value is still valid, while "8" is invalid
Why don't the compilers check the interchange format for giving decision about the numbers of digits represented in floating-point (instead of using a fixed value)?

Code:

#include <iostream> 
#include <limits>

using namespace std;

int main( )
{
    cout << numeric_limits<float> :: digits10 << endl;

    float f = -9999999;

    cout.precision ( 10 );

    cout << f << endl;
}

回答1:

You're not reading the documentation.

std::numeric_limits<float>::digits10 is 6:

The value of std::numeric_limits<T>::digits10 is the number of base-10 digits that can be represented by the type T without change, that is, any number with this many decimal digits can be converted to a value of type T and back to decimal form, without change due to rounding or overflow. For base-radix types, it is the value of digits (digits-1 for floating-point types) multiplied by log₁₀(radix) and rounded down.

The standard 32-bit IEEE 754 floating-point type has a 24 bit fractional part (23 bits written, one implied), which may suggest that it can represent 7 digit decimals (24 * std::log10(2) is 7.22), but relative rounding errors are non-uniform and some floating-point values with 7 decimal digits do not survive conversion to 32-bit float and back: the smallest positive example is 8.589973e9, which becomes 8.589974e9 after the roundtrip. These rounding errors cannot exceed one bit in the representation, and digits10 is calculated as (24-1)*std::log10(2), which is 6.92. Rounding down results in the value 6.

std::numeric_limits<float>::max_digits10 is 9:

The value of std::numeric_limits<T>::max_digits10 is the number of base-10 digits that are necessary to uniquely represent all distinct values of the type T, such as necessary for serialization/deserialization to text. This constant is meaningful for all floating-point types.

Unlike most mathematical operations, the conversion of a floating-point value to text and back is exact as long as at least max_digits10 were used (9 for float, 17 for double): it is guaranteed to produce the same floating-point value, even though the intermediate text representation is not exact. It may take over a hundred decimal digits to represent the precise value of a float in decimal notation.

回答2:

std::numeric_limits<float>::digits10 equates to FLT_DIG, which is defined by the C standard :

number of decimal digits, q, such that any floating-point number with q decimal digits can be rounded into a floating-point number with p radix b digits and back again without change to the q decimal digits,

⎧ p log10 b if b is a power of 10

⎨

⎩ ⎣( p − 1) log10 b⎦ otherwise

FLT_DIG 6

DBL_DIG 10

LDBL_DIG 10

The reason for the value 6 (and not 7), is due to rounding errors - not all floating point values with 7 decimal digits can be losslessly represented by a 32-bit float. Rounding errors are limited to 1 bit though, so the FLT_DIG value was calculated based on 23 bits (instead of the full 24) :

23 * log10(2) = 6.92

which is rounded down to 6.

来源：https://stackoverflow.com/questions/29510356/why-do-compilers-fix-the-digits-of-floating-point-number-to-6

标签

c++

c++11

iso