Yeah, I meant to say 80-bit. That\'s not a typo...
My experience with floating point variables has always involved 4-byte multiples, like singles (32 bit),
I used 80-bit for some pure math research. I had to sum terms in an infinite series that grew quite large, outside the range of doubles. Convergence and accuracy weren't concerns, just the ability to handle large exponents like 1E1000. Perhaps some clever algebra could have simplified things, but it was way quicker and easier to just code an algorithm with extended precision, than to spend any time thinking about it.
For me the use of 80 bits is ESSENTIAL. This way I get high-order (30,000) eigenvalues and eigenvectors of symmetric matrices with four more figures when using the GOTO library for vector inner products, viz., 13 instead of 9 significant figures for the kind of matrices that I use in relativistic atomic calculations, which is necessary to avoid falling into the sea of negative-energy states. My other option is using quadruple-precision arithmetic that increases CPU time 60-70 times and also increases RAM requirements. Any calculation relying on inner products of large vectors will benefit. Of course, in order to keep partial inner product results within registers it is necessary to use assembler language, as in the GOTO libraries. This is how I came to love my old Opteron 850 processors, which I will be using as long as they last for that part of my calculations.
The reason 80 bits is fast, whereas greater precision is so much slower, is that the CPU's standard floating-point hardware has 80-bit registers. Therefore, if you want the extra 16 bits (11 extra bits of mantissa, four extra bits of exponent and one extra bit effectively unused), then it doesn't really cost you much to extend from 64 to 80 bits -- whereas to extend beyond 80 bits is extremely costly in terms of run time. So, you might as well use 80-bit precision if you want it. It is not cost-free to use, but it comes pretty cheap.
Intel's FPUs use the 80-bit format internally to get more precision for intermediate results.
That is, you may have 32-bit or 64-bit variables, but when they are loaded into the FPU registers, they are converted to 80 bit; the FPU then (by default) performs all calculations in 80 but; after the calculation, the result is stored back into a 32-bit or 64-bit variables.
BTW - A somewhat unfortunate consequence of this is that debug and release builds may produce slightly different results: in the release build, the optimizer may keep an intermediate variable in an 80-bit FPU register, while in the debug build, it will be stored in a 64-bit variable, causing loss of precision. You can avoid this by using 80-bit variables, or use an FPU switch (or compiler option) to perform all calculations in 64 bit.
Wikipedia explains that an 80-bit format can represent an entire 64-bit integer without losing information. Thus the floating-point unit of the CPU can be used to implement multiplication and division for integers.
Another advantage not yet mentioned for 80-bit types is that on 16-bit or 32-bit processors which don't have floating-point units but do have a "multiply" instruction which produces a result twice as long as the operands (16x16->32 or 32x32->64), arithmetic on a 64-bit mantissa subdivided into four or two 16-bit or 32-bit registers will be faster than arithmetic on a 53-bit mantissa which spans the same number of registers but has to share 12 register bits with the sign and exponent. For applications which don't need anything more precise than float
, computations on a 48-bit "extended float" type could likewise be faster than computations on a 32-bit float
.
While some people might bemoan the double-rounding behavior of extended-precision types, that is realistically speaking only an issue in specialized applications requiring full bit-exact cross-platform reproducibility. From an accuracy standpoint, the difference between a rounding error of 64/128 vs 65/128, or 1024/2048ulp vs 1025/2048, is a non-issue; in languages with extended-precision variable types and consistent extended-precision semantics, use of extended types on many platforms without floating-point hardware (e.g. embedded systems) will offer both higher accuracy and better speed than single- or double-precision floating-point types.