Does IEEE-754 float, double and quad guarantee exact representation of -2, -1, -0, 0, 1, 2?

China☆狼群 提交于 2020-01-16 20:42:51

问题


All is in the title: does IEEE-754 float, double and quad guarantee exact representation of -2, -1, -0, 0, 1, 2 ?


回答1:


It guarantees precise representations of all integers until the number of significant binary digits exceeds the range of the mantissa.




回答2:


IEEE 754 floating point numbers can be used to store precisely integers of a certain ranges. For example:

  • binary32, implemented in C/C++ as float, provides 24 bits of precision and therefore can represent with full precision 16-bit integers, e.g. short int;
  • binary64, implemented in C/C++ as double, provides 53 bits of precision and can represent exactly 32-bit integers, e.g. int;
  • the non-standard Intel 80-bit precision, implemented as long double by some x86/x64 compilers, provides 64 significant bits and can represent 64-bit integers, e.g. long int (on LP64 systems, e.g. Unix) or long long int (on LLP64 systems, e.g. Windows);
  • binary128, implemented as compiler-specific types such as __float128 (GCC) or _Quad (Intel C/C++), provides 113 bits in the mantissa and therefore can represent exactly 64-bit integers.

The fact that double fits an extended range of integers, even surpassing the range of 32-bit integers, is used in JavaScript, which doesen't have special integer numerical type and instead uses double precision floating-point to represent integers.

One quirk of floating-point numbers is that they have separate sign bit and therefore things like positive and negative zeros exist, which is not possible in the two's complement signed integer representation.




回答3:


Simple way to get answer for any decimal number, convert the absolute value to binary (24 bits for float, 53 bits for double, 113 bits for quad), then back to decimal, and see if you get same value back.

For integers, answer is obvious, you don't lose anything, unless value is too big to fit into given number of bits.

Conversion of rational values with non-integer part is more interesting. There you may lose precision when converting to a binary with some fixed width, and when converting back to decimal, you may get a decimal value with periodic decimal expansion (or again lose precision if you round it).


Since you're dabbling with IEEE floats, first read the wikipedia page, then when you feel you're ready for more, proceed with the first external link there, "What Every Computer Scientist Should Know About Floating-Point Arithmetic".



来源:https://stackoverflow.com/questions/20029443/does-ieee-754-float-double-and-quad-guarantee-exact-representation-of-2-1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!