float128 and double-double arithmetic

白昼怎懂夜的黑 提交于 2019-11-29 18:14:58

“In this case, we use two double to store the value. So we need to make two operations at each time.”

This is not how double-double arithmetic works. You should expect one double-double operation to be implemented in anywhere from 6 to 20 double operations depending on the actual operation being implemented, the availability of a fused-multiply-add operation, the assumption that one operand is larger than the other, …

For instance, here is one implementation of a double-double multiplication for when an FMA instruction is not available, taken from CRlibm:

#define Mul22(zh,zl,xh,xl,yh,yl)                      \
{                                                     \
double mh, ml;                                        \
                              \
  const double c = 134217729.;                \
  double up, u1, u2, vp, v1, v2;              \
                              \
  up = (xh)*c;        vp = (yh)*c;            \
  u1 = ((xh)-up)+up;  v1 = ((yh)-vp)+vp;          \
  u2 = (xh)-u1;       v2 = (yh)-v1;                   \
                              \
  mh = (xh)*(yh);                     \
  ml = (((u1*v1-mh)+(u1*v2))+(u2*v1))+(u2*v2);        \
                              \
  ml += (xh)*(yl) + (xl)*(yh);                \
  *zh = mh+ml;                        \
  *zl = mh - (*zh) + ml;                              \
}

The first 8 operations alone are for dividing exactly each double from the operands into two halves so that one half from each side can be multiplied with one half from the other side and the result obtained exactly as a double. The computations u1*v1, u1*v2, … do exactly that.

The values obtained in mh and ml can overlap, so the last 3 operations are there to renormalize the result into the sum of two floating-point numbers.

In this case we can have round-off errors on each double or their is a mechanism that avoid this?

As the comment says:

/*
 * computes double-double multiplication: zh+zl = (xh+xl) *  (yh+yl)
 * relative error is smaller than 2^-102
 */

You can find about all the mechanisms used to achieve these results in the Handbook of Floating-Point Arithmetic.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!