Does a floating-point reciprocal always round-trip?

五迷三道 提交于 2019-12-22 09:23:35

问题


For IEEE-754 arithmetic, is there a guarantee of 0 or 1 units in the last place accuracy for reciprocals? From that, is there a guaranteed error-bound on the reciprocal of a reciprocal?


回答1:


[Everything below assumes a fixed IEEE 754 binary format, with some form of round-to-nearest as the rounding-mode.]

Since reciprocal (computed as 1/x) is a basic arithmetic operation, 1 is exactly representable, and the arithmetic operations are guaranteed correctly rounded by the standard, the reciprocal result is guaranteed to be within 0.5 units in the last place of the true value. (This applies to any of the basic arithmetic operations specified in the standard.)

The reciprocal of the reciprocal of a value x is not guaranteed to be equal to x, in general. A quick example with the IEEE 754 binary64 format:

>>> x = 1.8
>>> 1.0 / (1.0 / x)
1.7999999999999998
>>> x == 1.0 / (1.0 / x)
False

However, assuming that overflow and underflow are avoided, and that x is finite, nonzero, and exactly representable, the following results are true:

  1. The value of 1.0 / (1.0 / x) will differ from x by no more than 1 unit in the last place.

  2. If the significand of x (normalised to lie in the range [1.0, 2.0) as usual) is smaller than sqrt(2), then the reciprocal does roundtrip: 1.0 / (1.0 / x) == x.

Sketch of proof: without loss of generality we can assume that x is positive, and scale x by a power of two so that it lies in the range [1.0, 2.0). The results above are clearly true in the case that x is an exact power of two, so let's assume it's not (this will be useful in the second step below). The proof below is given for precision specific to the IEEE 754 binary64 format, but it adapts directly to any IEEE 754 binary format.

Write 1 / x for the true value of the reciprocal, before rounding, and let y be the (unique, as it turns out) nearest representable binary64 float to 1 / x. Then:

  • since y is the closest float to 1 / x, and both y and 1/x are in the binade [0.5, 1.0], where the spacing between successive floats is exactly 2^-53, we have |y - 1/x| <= 2^-54. In fact, we can do a bit better:

  • we actually have a strict inequality above: |y - 1/x| < 2^-54. If |y - 1/x| were exactly equal to 2^-54, then 1/x would be exactly representable in arbitrary-precision binary floating-point (since both y and 2^-54 are). But the only binary floats x for which 1/x is exactly representable at some precision are powers of two, and we already excluded this case.

  • if x < sqrt(2) then 1 / x > x / 2, hence (rounding both to the nearest representable float), we have y >= x / 2, so x / y <= 2.

  • now x - 1/y = (y - 1/x) x/y, and from the bounds on |y - 1/x| and x/y (still assuming that x < sqrt(2)) we get |x - 1/y| < 2^-53. It follows that x is the closest representable float to 1/y, 1/y rounds to x and the roundtrip succeeds. This completes the proof of part 2.

  • in the general case x < 2, we have x / y < 4, so |x - 1/y| < 2^-52. That makes 1/y at most 1 ulp away from x, which completes the proof of part 1.

Here's a demonstration of the sqrt(2) threshold: using Python, we take a million random floats in the range [1.0, 2.0), and identify those that don't roundtrip through the reciprocal. All of the samples smaller than sqrt(2) pass the roundtrip.

>>> import random
>>> samples = [random.uniform(1.0, 2.0) for _ in range(10**6)]
>>> bad = [x for x in samples if 1.0 / (1.0 / x) != x]
>>> len(bad)
171279
>>> min(bad)
1.4150519879892107
>>> import math
>>> math.sqrt(2)
1.4142135623730951

And a demonstration that the maximum error is no more than 1 ulp, in general (for the binary64 format, in the binade [1.0, 2.0), 1 unit in the last place is 2^-52):

>>> samples = [random.uniform(1.0, 2.0) for _ in range(10**6)]
>>> max(abs(x - 1.0 / (1.0 / x)) for x in samples)
2.220446049250313e-16
>>> 2**-52
2.220446049250313e-16

Here's an example with the IEEE 754 binary64 format showing that the restriction about avoiding underflow is necessary:

>>> x = 1.3e308
>>> x_roundtrip = 1.0 / (1.0 / x)
>>> x.hex()
'0x1.72409614c1e6ap+1023'
>>> x_roundtrip.hex()
'0x1.72409614c1e6cp+1023'

Here the x_roundtrip result differs from the original by two units in the last place, because 1 / x was smaller than the smallest normal representable float, and so was not represented to the same precision as x.

Final note: since IEEE 754-2008 covers decimal floating-point types as well, I should mention that the above proof carries over almost verbatim to the decimal case, establishing that for floats with significand smaller than sqrt(10), roundtrip occurs, while for general decimal floats (again avoiding overflow and underflow) we're never off by more than one unit in the last place. However, there's some number-theoretic finesse required to show that the key inequality |x - 1/y| < 1/2 10^(1-p) is always strict: one ends up having to show that the quantity 1 + 16 10^(2p-1) is never a square number (which is true, but it's probably outside the ambit of this site to include the proof here).

>>> from decimal import Decimal, getcontext
>>> import random
>>> getcontext().prec = 6
>>> samples = [+Decimal(random.uniform(1.0, 10.0)) for _ in range(10**6)]
>>> bad = [x for x in samples if 1 / (1 / x) != x]
>>> min(bad)
Decimal('3.16782')


来源:https://stackoverflow.com/questions/44623331/does-a-floating-point-reciprocal-always-round-trip

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!