How to weigh up calculation error

試著忘記壹切 提交于 2019-12-25 03:12:31

问题


Consider the following example. There is an image where user can select rectangular area (part of it). The image is displayed with some scale. Then we change the scale and we need to recalculate the new coordinates of selection. Let's take width,

newSelectionWidth = round(oldSelectionWidth / oldScale * newScale)

where oldScale = oldDisplayImageWidth / realImageWidth, newScale = newDisplayImageWidth / realImageWidth, all the values except for scales are integers.

The question is how to prove that newSelectionWidth = newDisplayImageWidth given oldSelectionWidth = oldDisplayImageWidth for any value of oldDisplayImageWidth, newDisplayImageWidth, realImageWidth? Or under what conditions this doesn't hold?

I was thinking about the answer too and this is what I've come up with, may be inaccurate and/or incomplete.

All numbers in JavaScript are double-precision numbers. Generally, this gives us maximum error of about 10-16 (machine epsilon). This means in order to have error of 0.5 or more, (1) we need to perform 0.5 / 10-16 = 5·1015 operations. The other source of error is calculation with too big (|value| > 1.7976931348623157·10308) or too low numbers (|value| < 2.2250738585072014·10-308) (link). This means (2) if somewhere in the course of calculation we get too big or too low number, e.g. because oldDisplayImageWidth / realImageWidth > 1.7976931348623157·10308 or the like, then the error might exceed 0.5. Granted we're talking about displaying images on today's monitors, all these conditions are extremely unlikely.


回答1:


If newDisplayWidth is less than 1125899906842624 and the other integers are positive and do not exceed 53 bits, then newSelectionWidth equals newDisplayWidth. A proof follows.

Notation:

  • I will use the term double to name the floating-point type being used, IEEE-754 64-bit binary.
  • Text in code style represents computed values, while plain text represents mathematical values. Thus 1/3 is exactly one-third, while 1./3. is the result of dividing 1 by 3 in floating-point arithmetic.

I assume:

  • The widths are positive integers not wider than the double significand (53 bits).
  • The divisions oldDisplayImageWidth / realImageWidth and newDisplayImageWidth / realImageWidth are performed in double arithmetic with the operands converted to double.

The limits on the integers assures that conversion to double is exact and that overflow and underflow are not encountered during the operations used in this problem.

Consider oldScale, which is a double set to oldDisplayImageWidth / realImageWidth. The maximum error in a single floating-point operation in round-to-nearest mode is half an ULP (because every mathematical number is no farther than half an ULP from a representable number). Thus, oldScale equals oldDisplayImageWidth / realImageWidth • (1+e0), where e0 represents the relative error and is at most half a double epsilon. (The double epsilon is 2-52, so |e0| ≤ 2-53.)

Similarly, newScale is newDisplayImageWidth / realImageWidth • (1+e1), where e1 is some error that is at most 2-53.

Then oldSelectionWidth / oldScale is oldSelectionWidth / oldScale • (1+e2), again for some e2 ≤ 2-53, and oldSelectionWidth / oldScale * newScale is oldSelectionWidth / oldScale • (1+e2) • newScale • (1+oldSelectionWidth / oldScale • (1+e3). Note that this is the argument passed to round.

Now substitute the expressions we have for oldScale and newScale. This yields oldSelectionWidth / (oldDisplayImageWidth / realImageWidth • (1+e0)) • (1+e2) • (newDisplayImageWidth / realImageWidth • (1+e1)) • (1+e3). The realImageWidth terms cancel, and we can rearrange the others to produce oldSelectionWidth • newDisplayImageWidth / oldDisplayImageWidth • (1+e1) • (1+e2) • (1+e3) / (1+e0).

We are given that oldSelectionWidth equals oldDisplayImageWidth, so those cancel, and the argument to round is exactly: newDisplayImageWidth • (1+e1) • (1+e2) • (1+e3) / (1+e0).

Consider the combined error terms minus one (this is the relative error in the final value): (1+e1) • (1+e2) • (1+e3) / (1+e0) – 1. This expression has greatest magnitude when e0 is –2-53 and the others are +2-53. Then it is slightly greater than 2 ULP (at most 324518553658426753804753784799233 / 730750818665451377972204001751459814038961127424). If newDisplayImageWidth is less than 1125899906842624, then newDisplayImageWidth times this relative error is less than ½. Therefore, newDisplayImageWidth • (1+e1) • (1+e2) • (1+e3) / (1+e0) would be within ½ of newDisplayImageWidth.

Since newDisplayImageWidth is an integer, if the argument to round is within ½ of newDisplayWidth, then the result is newDisplayWidth.

Therefore, if newDisplayWidth is less than 1125899906842624, then newSelectionWidth equals newDisplayWidth.

(The above proves that 1125899906842624 is a sufficient limit, but it may not be necessary. A more involved analysis may be able to prove that certain combinations of errors are impossible, so the maximum combined error is less than used above. This would relax the limit, allowing larger values of newDisplayWidth.)




回答2:


You're mixing up absolute and relative error. Assuming a relative error of 10^-16 you end up with a maximum relative error of 4 * 10^-16 after the 4 operations in your example. You want an absolute error < 0.5, so you're fine, as long as newSelectionWidth * 4 * 10^-16 < 0.5.



来源:https://stackoverflow.com/questions/16862538/how-to-weigh-up-calculation-error

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!