Converting from floating-point to decimal with floating-point computations

不羁岁月 提交于 2019-12-22 12:07:28

问题


I am trying to convert a floating-point double-precision value x to decimal with 12 (correctly rounded) significant digits. I am assuming that x is between 10^110 and 10^111 such that its decimal representation will be of the form x.xxxxxxxxxxxE110. And, just for fun, I am trying to use floating-point arithmetic only.

I arrived to the pseudo-code below, where all operations are double-precision operations, The notation 1e98 is for the double nearest to the mathematical 10^98, and 1e98_2 is the double nearest to the result of the mathematical subtraction 10^98-1e98. The notation fmadd(X * Y + Z) is for the fused multiply-add operation with operands X,Y, Z.

  y = x * 2^-1074;    // exact
  q = y / 1e98;       // q is denormal and the significand of q interpreted
                      // as an integer is our candidate for the 12 decimal
                      // digits of x

  r = fmadd(q * 1e98 - y);  // close to 1e98 * (error made during the division)

  // If 1e98_2 >= 0, we divided by a number that was smaller than we wished
  // The correct answer may be q or q+1.

  if (r and 1e98_2 have opposite signs)
  {
    return the significand of q;
  }

  s = copysign(2^-1074, r);
  r1 = abs(r);
  r2 = abs(1e98_2);

  h = 1e98 * 0.5 * 2^-1074;

  Set rounding mode to downwards

  r3 = fmadd(r2 * q + r1);

  if (r3 < h)
  {
    return the significand of q;
  }
  else
  {
    return significand of (q + s)
  }

I apologize for the confusion that pervades the above pseudo-code, but it is not very clear for me yet, hence the following questions:

  1. Does the first fmadd work as intended (to compute 1e98 * (error made during the division))?

  2. The signs. I cannot convince myself that they are right. But I cannot convince myself that they are wrong either.

  3. Any idea, perhaps arguments, about the frequency with which this algorithm might produce the wrong result?

  4. If it works at all, is there any chance that the algorithm will continue to work if “q = y / 1e98” is changed to “q = y * 1e-98” (leaving all other instructions the same)?

I have not tested this algorithm. I do not have any computer with a fmadd instruction, although I hope to find one so that I can execute the above.


回答1:


Let y/d be the exact operation, and q=rnd(y/d) be the result rounded to nearest float.
Then the true error multiplied by d is rt=(rnd(y/d)-y/d)*d=q*d-y and the operation we performed with fmadd is r=rnd(q*d-y)
Why q*d-y is exact (fmadd does no final rounding) is less clear to explain, but say that q*d has a limited number of bits (<nbits(q)+nbits(d)), the exponent of y is that of q*d (+/- 1) and since the error is |rt|<0.5*ulp(q)*d, that means that first nbits(q) are vanishing... That answers to question 1.

So q*1e98 - y = r , where |r|*2^1074 <= 0.5e98 < 5*10^98 (2nd inequality is lucky)

q*(10^98) - y = r + (10^98-1e98)*q where |10^98-1e98|*q*2^1074 <= 0.5e95 (assuming at least 15 digits precision, log(2^53)/log(10) > 15)

So you ask whether |q*(10^98)-y|*2^1074>5*10^97

You have an approximation of |q*(10^98)-y| which is r+1e98_2*q

Since |r| < 5*10^98, and |r+(10^98-1e98)*q|<|r| if signs are opposite, I think that answers positively to question 2. But I wouldn't be so sure if 1e98_2 were < 0.

If r and 1e98_2 have same sign it might exceed 5*10^97, thus your further handling with discussion of r3 = 1e98_2*q + r versus h=0.5e98*2^-1074

For question 3, at first sight, I'd say that two things might make the algorithm fail:

  • 1e98_2 is not exact (10^98-1e98-1e98_2 = -3.6e63 approx.)

  • and h is not ht=0.5*10^98*2^-1074 but a bit smaller as we saw above.

The true error r3t is approximately (1e98_2-3e63)*q + r < r3 (and only the case when >0 is interesting us, because 1e98_2>0).

So an approximation of error r3 falling above approximated tie h when the true error r3t is below the true tie ht could lead to an incorrect rounding. Is it possible, and if yes how frequent is your question 3?

To mitigate above inequality risk, you tried to truncate the magnitude of r3, thus r3 <= 1e98_2*q + r. I felt a bit tired to perform a true analysis of error bounds...

So I scanned for an error, and the first failing example I found was 1.0000000001835e110 (I assume correctly rounded to nearest double, but it is in fact 1000000000183.49999984153799821120915424942630528225695526491963291846957919215885146546696544423465444842668032e98).

In this case, r and 1e98_2 have same sign, and

  • (x/1e98) > 1000000000183.50000215

  • q significand is thus rounded to 1000000000184

  • r3>h (r3*2^1074 is approx. 5.000001584620017e97) and we incorrectly incremented q+s, when it should have been q-s, definitely a bug.

My answers are:

  1. yes, r=fmadd(q * 1e98 - y) is exactly 1e98*(error made during division), but we don't care of the division, it's just providing a guess, what counts is that the subtraction is exact.

  2. yes, the sign is correct because |r| < 5*10^98, and |r+(10^98-1e98)*q|<|r| if signs are opposite. But I wouldn't be so sure if 1e98_2 were < 0.

  3. Taking first failing example (1.0000000001835e110 - 1.0e110)/1.0e110 ulp -> 1.099632e6, a very very naive conjecture would be to say that 1 case out of a million, r3 is falling over h... So once q+s corrected into q-s, the occurence of r3>h while r3t<ht is much much smaller than 1/1,000,000 in any case... there are more than 10^15 doubles in the range of interest, so consider this is not a serious answer...

  4. Yes, the discussion above is solely about the guess q, independently of the way it was produced, and the subtraction in 1. will still be exact...



来源:https://stackoverflow.com/questions/17710243/converting-from-floating-point-to-decimal-with-floating-point-computations

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!