Taking logs and adding versus multiplying

有些话、适合烂在心里 提交于 2019-12-05 18:21:47
tmyklebu

Absent any overflow or underflow shenanigans, if a and b are floating-point numbers, then the product a*b will be computed to within a relative error of 1/2 ulp.

A crude bound on the relative error after multiplying a chain of N doubles therefore results in answer off by a factor of at most (1 - epsilon/2)-N, which is about exp(epsilon N/2). I'd imagine you can expect a deviation of around epsilon sqrt(N) in the average case. (To first order, this is about N epsilon.)

Exponent overflow and underflow are more likely to happen with this strategy, though; you're more likely to get infinities, zeroes, and NaNs as well as imprecise values due to rounding of subnormals.

The other approach is more robust in that sense, but it is much slower and errs worse in the case where the straightforward approach doesn't result in an overflow or underflow. Here's a very, very crude analysis for standard doubles in the case where N is at least a couple orders of magnitude smaller than 253:

You can always take the log of a finite floating-point number and get a finite floating-point number, so we're cool there. You can add up N floating-point numbers either straightforwardly to get N epsilon worst-case "relative" error and sqrt(N) epsilon expected "relative" error, or using Kahan summation to get about 3 epsilon worst-case "relative" error. Scare quotes are around "relative" because the error is relative to the sum of the absolute values of the things you're summing.

Notice that no finite double has a logarithm whose absolute value is bigger than 710 or so. That means our sum-of-logarithms computed using Kahan summation has an absolute error of at most 2130 N epsilon. When we exponentiate our sum-of-logarithms, we get something off by a factor of at most exp(2130 N epsilon) from the right answer.

A pathological examples for the log-sum-exp approach:

int main() {
  double foo[] = {0x1.000000000018cp1023, 0x1.0000000000072p-1023};
  double prod = 1;
  double sumlogs = 0;
  for (int i = 0; i < sizeof(foo) / sizeof(*foo); i++) {
    prod *= foo[i];
    sumlogs += log(foo[i]);
  }
  printf("%a %a\n", foo[0], foo[1]);
  printf("%a %a %a\n", prod, exp(sumlogs), prod - exp(sumlogs));
}

On my platform, I get a difference of 0x1.fep-44. I'm sure there are worse examples.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!