According to IEEE 754-2008 there are
There are three binary floating-point basic formats (which can be encoded using 32, 64 or 128 bits) and two decimal f
In addition to the 32-bit float
and 64-bit double
, GCC offers __float80
, __float128
, _Decimal32
, _Decimal64
, _Decimal128
; for ARM targets, it also offers the half-precision __fp16
.
Intel CPUs support 80-bit floats in hardware using the old scalar x87 FPU instructions (but not with the SSE vector instructions). I'm not aware of any mainstream CPUs with hardware support for the decimal FP types.
It looks like the current crop of Microsoft compilers provide 64-bit for both double
and long double
, but older ones gave you 80-bit for long double
.
See documentation here:
C++ does not provide decimal types; the only floating point types are float
, double
and long double
.
Neither does C++ specify that these use IEEE754 representations, or that they have any particular size. The only requirement is that double
provides at least as much precision as float
, and that long double
provides at least as much precision as double
.
Intel has a decimal floating-point library which will work with either ICC or GCC on Mac, Linux, HP/UX, or Solaris; or the ICC or CL compilers on Windows. It's not as useful as using operators on built-in types. If you're using C++, maybe someone has already written helpful classes that override all the necessary operators for that.
If you want the convenience of built-in operators, but don't want to write it yourself, I'd recommend checking out Bloomberg Finance's open-source C++ libraries on GitHub. In particular, the BDE package contains a IEEE 754 "Decimal 32/64/128" implementation (see bdldfp_decimal.h)
The nice thing about this library is that it supports multiple different IEEE 754 backend implementations, including a C99 reference implementation, the decNumber implementation that comes with GCC, and Intel's open-source IntelDFP library (see bdldfp_decimalplatform.h for details). It also supports configurable endian-ness.
C++ does not specify that float
s must be 32-bit or that double
s must be 64-bit. It does not even require there to be 8 bits in a byte (though there do have to be at least 8).
[C++11: 3.9.1/8]:
There are three floating point types:float
,double
, andlong double
. The typedouble
provides at least as much precision asfloat
, and the typelong double
provides at least as much precision asdouble
. The set of values of the typefloat
is a subset of the set of values of the typedouble
; the set of values of the typedouble
is a subset of the set of values of the typelong double
. The value representation of floating-point types is implementation-defined. Integral and floating types are collectively called arithmetic types. Specializations of the standard templatestd::numeric_limits
(18.3) shall specify the maximum and minimum values of each arithmetic type for an implementation.
See the documentation for your toolchain and platform to see what its type sizes are. It might support long double
, which in turn might be what you want.