IEEE-754 floating-point precision: How much error is allowed?

前端 未结 2 1334
说谎
说谎 2020-12-03 23:23

I\'m working on porting the sqrt function (for 64-bit doubles) from fdlibm to a model-checker tool I\'m using at the moment (cbmc).
As part of my doings, I

相关标签:
2条回答
  • 2020-12-04 00:01

    The IEEE-754 standard requires that so called "basic operations" (which include addition, multiplication, division and square root) are correctly rounded. This means that there is a unique allowed answer, and it is the closest representable floating-point number to the so-called "infinitely precise" result of the operation.

    In double-precision, numbers have 53 binary digits of precision, so the correct answer is the exact answer rounded to 53 significant digits. As Rick Regan showed in his answer, this is exactly the result that you got.

    The answers to your questions are:

    Question 1: Is this huge amount of error allowed?

    Yes, but it is quite misleading to call this error "huge". The fact is that there is no double-precision value that could be returned that would have a smaller error.

    Question 2: Does that mean, that every basic operation should have an error < 2.220446e-16 with 64-bit doubles (machine-epsilon)?

    No. It means that every basic operation should be rounded to the (unique) closest representable floating-point number according to the current rounding mode. This is not quite the same as saying that the relative error is bounded by machine epsilon.

    Question 3: Which result do you obtain with your x86 hardware and gcc + libc?

    The same answer you did, because sqrt is correctly rounded on any reasonable platform.

    0 讨论(0)
  • 2020-12-04 00:16

    In binary, the first 58 bits of the arbitrary precision answer is 1011111111111111111111110101010101111111111111111011010001...

    The 53-bit significand of the double value is

    10111111111111111111111101010101011111111111111110111

    Which means that the double value is correctly rounded to 53 significant bits, and is within 1/2 ULP. (That the error is "large" is only because the number itself is large).

    0 讨论(0)
提交回复
热议问题