What uncommon floating-point sizes exist in C++ compilers?

后端 未结 3 504
谎友^
谎友^ 2021-01-14 17:59

The C++14 draft standard seems rather quiet about the specific requirements for float, double and long double, although these sizes seem to be common:

相关标签:
3条回答
  • 2021-01-14 18:29

    If you're only asking about size in bits then odd-sized types only exist in some older platforms that don't use 8-bit (or another power of 2) bytes like the Unisys ClearPath Dorado Servers with 36-bit float and 72-bit double. That beast is still even in active development until now. The last version was in 2018. Mainframes and servers live a very long life so you can still see some PDP-10 and other architectures in use in modern times, with modern compiler support

    If you care about the formats then there are lots of standard compliant 32, 64 and 128-bit floating-point formats that aren't IEEE-754 like the hex and decimal floating point types in IBM z, Cray formats and VAX formats. In fact IBM z is one of the very rare modern platforms with decimal float hardware, although if you use GCC and some other compilers you can use their built-in software support for decimal float. IBM also uses the special double-double format which is still the default for long double on PowerPC until now

    There are also some other non-standard 24-bit floats in a few modern C/C++ compilers for microcontrollers

    Here's the summary of most of the available floating-point formats. See also Do any real-world CPUs not use IEEE 754?. For more information continue to the next section


    Types in C++ are generally mapped to hardware types for performance reasons. Therefore floating-point types will be whatever available on the CPU if it ever has an FPU. In modern computers IEEE-754 is the dominant format in hardware, and due to the requirements in C++ standard float and double must be mapped to at least IEEE-754 single and double precision respectively

    Hardware support for types with higher precision is not common except on x86 and a few other rare platforms with 80-bit extended precision, therefore long double is usually mapped to the same type as double on those platforms. However recently long double is being slowly migrated to IEEE-754 quadruple precision in many compilers like GCC or Clang. Since that one is implemented with the built-in software library, performance is a lot worse. Depending on whether you favor faster execution or higher precision you're still free to choose whatever type long double maps to though. For example on x86 GCC has -mlong-double-64/80/128 and -m96/128bit-long-double options to set the padding and format of long double. The option is also available in many other architectures like the S/390 and zSeries

    PowerPC OTOH by default uses a completely different 128-bit long double format implemented using double-double arithmetic and has the same range as IEEE-754 double precision. Its precision is slightly lower than quadruple precision but it's a lot faster because it can utilize the hardware double arithmetic. As above, you can choose between the 2 formats with the -mabi=ibmlongdouble/ieeelongdouble options. That trick is also used in some platforms where only 32-bit float is supported to get near-double precision

    IBM z mainframes traditionally use IBM hex float formats and they still use it nowadays. But they do also support IEEE-754 binary and decimal floating-point types in addition to that

    The format of floating-point numbers can be either base 16 S/390® hexadecimal format, base 2 IEEE-754 binary format, or base 10 IEEE-754 decimal format. The formats are based on three operand lengths for hexadecimal and binary: short (32 bits), long (64 bits), and extended (128 bits). The formats are also based on three operand lengths for decimal: _Decimal32 (32 bits), _Decimal64 (64 bits), and _Decimal128 (128 bits).

    Floating-point numbers

    Other architectures may have other floating-point formats, like VAX or Cray. However since those mainframes are still being used, their newer hardware version also include support for IEEE-754 just like how IBM did with their mainframes

    On modern platforms without FPU the floating-point types are usually IEEE-754 single and double precision for better interoperability and library support. However on 8-bit microcontrollers even single precision is too costly, therefore some compilers support a non-standard mode where float is a 24-bit type. For example the XC8 compiler uses a 24-bit floating-point format that is a truncated form of the 32-bit format, and NXP's MRK uses a different 24-bit float format

    Due to the rise of graphics and AI applications that require a narrower floating-point type, 16-bit float formats like IEEE-754 binary16 and Google's bfloat16 are also introduced to in many platforms and compilers also have some limited support for them, like __fp16 in GCC

    0 讨论(0)
  • 2021-01-14 18:31

    "float" and "double" are de-facto standardised on the IEEE single and double precision representations. I would put assuming these sizes in the same category as assuming CHAR_BIT==8. Some older arm systems did have wierd "mixed-endian" doubles, but unless you are working with retro stuff you are unlikely to encounter that nowadays.

    long double on the other hand is far more variable. Sometimes it's IEEE double precision, sometimes it's 80 bit x87 extended, sometimes it's IEEE quad precision , sometimes it's a "double double" format made up from two IEEE double precision numbers added together.

    So in portable code you can't rely on "long double" being any better than "double".

    0 讨论(0)
  • 2021-01-14 18:44

    First of, I am new to stackoverflow, so please bear with me.

    However, to answer your question. Looking at the floath.h headers, which specify floating point parameters for the:

    1. Intel Compiler

      //Float:
      #define FLT_MAX                 3.40282347e+38F
      
      //Double:
      #define DBL_MAX                 1.7976931348623157e+308
      
      //Long Double:
      #if (__IMFLONGDOUBLE == 64) || defined(__LONGDOUBLE_AS_DOUBLE)
      #define LDBL_MAX                    1.7976931348623157e+308L
      #else
      #define LDBL_MAX                1.1897314953572317650213E+4932L
      
    2. GCC (MinGW actually gcc 4 or 5)

      //Float:
      #define FLT_MAX         3.40282347e+38F
      
      //Double:
      #define DBL_MAX     1.7976931348623157e+308
      
      //Long Double: (same as double for gcc):
      #define LDBL_MAX        1.7976931348623157e+308L
      
    3. Microsoft

      //Float:
      #define FLT_MAX         3.40282347e+38F
      
      //Double:
      #define DBL_MAX     1.7976931348623157e+308
      
      //Long Double: (same as double for Microsoft):
      #define LDBL_MAX            DBL_MAX
      

    So, as you can see only the Intel compiler provides 80 bit representation for long double on a "standard" windows machine.

    This data is copied from the respective float.h headers from a windows machine.

    0 讨论(0)
提交回复
热议问题