sizeof long double and precision not matching?

大城市里の小女人 提交于 2019-11-29 10:23:29

The long double format in your C implementation uses an Intel format with a one-bit sign, a 15-bit exponent, and a 64-bit significand (ten bytes total). The compiler allocates 16 bytes for it, which is wasteful but useful for some things such as alignment. However, the 64 bits provide only log10(264) digits of significance, which is about 20 digits.

Various C implementations of the long double may have variant range and precision. The sizeof hints to the underlying floating point notation, but does not specify it. A long double is not required to have 33 to 36 decimals. It could even have exactly the same representation as a double.

Without hard-coding the precision, but using all the available precision and not overdoing it, recommend:

const long double ld = 0.12345678901234567890123456789012345L;
printf("%.*Le\n", LDBL_DIG + 3, ld);
printf("%.*Le\n", LDBL_DIG + 3, nextafterl(ld, ld*2));

This prints out (on my eclipse intel 64-bit), of course, yours may differ.

1.234567890123456789013e-01
1.234567890123456789081e-01

[Edit]

On review, a +2 is sufficient. Better to use LDBL_DECIMAL_DIG. see Printf width specifier to maintain precision of floating-point value

printf("%.*Le\n", (LDBL_DIG + 3) - 1, ld);
printf("%.*Le\n", LDBL_DECIMAL_DIG - 1, ld);

The format on your computer is indeed the Intel double extended-precision format, 80 bits wide, with 15-bit exponent and 64-bit mantissa.

Only 10 consecutive bytes of the memory are actually used of the storage. Intel manuals (Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D and 4) say the following:

When storing floating-point values in memory, half-precision values are stored in 2 consecutive bytes in memory; single-precision values are stored in 4 consecutive bytes in memory; double-precision values are stored in 8 consecutive bytes; and double extended-precision values are stored in 10 consecutive bytes.

However, the x86 Linux ABIs specify that full 16 bytes are actually consumed. This is possibly because a 10-byte value could only have a fundamental alignment requirement of 2 in arrays, which can cause peculiar issues.

Also, array indexing is easier with multiples of 16.

Most of the time this is a non-issue, as long doubles are usually used to minimize error in intermediate calculations and the result be then truncated to a double.

The sizeof operator returns the size in bytes of the data type. The floating point format types are not really comparable to the byte size of the data type, other that bigger size usually means better precision.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!