x86-64 long double precision

亡梦爱人 提交于 2019-12-04 05:36:53

x86-64 precision is the same as regular x86. Extended double is 80 bits, using the x87 ISA, with 6 padding bytes added. There is no 128-bit FP hardware.

A software implementation of quad or extended quad precision might benefit from the x86-64 64x64 => 128 integer multiply instruction, though.

I would recommend using MPFR. It is a more sophisticated multiple-precision floating point library that is built on top of GMP.

There is a good chance that it's 64 bit for both (depending on the compiler and OS), because the compiler is emitting scalar SSE2 instead of x87 instructions.

x86 doesn't support higher precision than 80 bits, but if you really need more than 64 bits for a FP algorithm most likely you should check your numerics instead of solving the problem with brute force.

There are a few of options.

  1. use double-double to represent quad. For example, see http://www.codeproject.com/Articles/884606/The-double-double-type. However, the type does not confirm to IEEE standard. You can tell by inspecting its epsilon value being less accurate than IEEE standard 128-bit float which is 1.926E-34.
  2. use true IEEE standard 128-bit floats. Microsoft VC++ compiler does not provide such type. Intel C++ compiler does provide a type _Quad, although its implementation is not complete (no I/O operations) at this time.
  3. use third party library. I have recently created a library called double128 that is based on Intel C++ _Quad but adds I/O operations. It works with Microsoft VC++. You can visit http://www.cg-inc.com/Product/Double128 for more information.

I recommend the Boost wrappers over MPFR or GMP:

Boost 1.70: cpp_bin_float.

As well as arbitrary types to any desired precision, the following types are provided:

cpp_bin_float_single           (24 bits + mantissa = 32 bits)
cpp_bin_float_double           (53 bits + mantissa = 64 bits)
cpp_bin_float_double_extended  (64 bits + mantissa)
cpp_bin_float_quad             (113 bits + mantissa = 128 bits)
cpp_bin_float_oct              (237 bits) + mantissa = 256 bits)

Boost offers almost out-of-the-box functionality. Once compiled, all one needs to do is add a pointer within the Visual Studio project to the include and library directories.

Tested with Visual Studio 2017 + Boost v1.70.

See instructions to compile boost.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!