double-double precision floating point as sum of two doubles

折月煮酒 提交于 2019-12-01 09:56:43

问题


Following papers and source code for double-double arithmetic for some time, I still can't find out how exactly a dd_real ( defined as struct dd_real { double x[2];...}) number is split into two doubles. Say if I initialize it with a string, dd_real pi = "3.14159265358979323846264338327950"; what will be pi.x[0] and pi.xi[1]? I need to understand it and then write a hopefully small Python function that does it.

The reason I don't just want to call into the QD library is that I'd prefer to reimplement the correct split in Python so that I send my 35-digit precision constants (given as strings) as double2 to CUDA code where it will be treated as double-double reals by the GQD library -- the only library, it seems, to deal with extended precision caclulations in CUDA. That unfortunately rules out mpmath too, on Python side.


回答1:


Say that you initialize your double double with the binary number:

1.011010101111111010101010101010000000101010110110000111011111101010010101010
  < ---                 52 binary digits         --- >< --- more digits --- >

Then one double will be 1.0110101011111110101010101010100000001010101101100001 and the other will be 1.1011111101010010101010 * 2^-53

When you add these two numbers (as reals), the sum is the initial value. The first one packs as many bits as possible in its 52-bit mantissa. The second one contains the remaining bits, with the appropriate exponent.



来源:https://stackoverflow.com/questions/9857418/double-double-precision-floating-point-as-sum-of-two-doubles

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!