Floating point equality

前端 未结 6 2047
渐次进展
渐次进展 2021-02-02 06:27

It is common knowledge that one has to be careful when comparing floating point values. Usually, instead of using ==, we use some epsilon or ULP based equality test

6条回答
  •  感情败类
    2021-02-02 06:40

    Only a) and b) are guaranteed to succeed in any sane implementation (see the legalese below for details), as they compare two values that have been derived in the same way and rounded to float precision. Consequently, both compared values are guaranteed to be identical to the last bit.

    Case c) and d) may fail because the computation and subsequent comparison may be carried out with higher precision than float. The different rounding of double should be enough to fail the test.

    Note that the cases a) and b) may still fail if infinities or NANs are involved, though.


    Legalese

    Using the N3242 C++11 working draft of the standard, I find the following:

    In the text describing the assignment expression, it is explicitly stated that type conversion takes place, [expr.ass] 3:

    If the left operand is not of class type, the expression is implicitly converted (Clause 4) to the cv-unqualified type of the left operand.

    Clause 4 refers to the standard conversions [conv], which contain the following on floating point conversions, [conv.double] 1:

    A prvalue of floating point type can be converted to a prvalue of another floating point type. If the source value can be exactly represented in the destination type, the result of the conversion is that exact representation. If the source value is between two adjacent destination values, the result of the conversion is an implementation-defined choice of either of those values. Otherwise, the behavior is undefined.

    (Emphasis mine.)

    So we have the guarantee that the result of the conversion is actually defined, unless we are dealing with values outside the representable range (like float a = 1e300, which is UB).

    When people think about "internal floating point representation may be more precise than visible in code", they think about the following sentence in the standard, [expr] 11:

    The values of the floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.

    Note that this applies to operands and results, not to variables. This is emphasized by the attached footnote 60:

    The cast and assignment operators must still perform their specific conversions as described in 5.4, 5.2.9 and 5.17.

    (I guess, this is the footnote that Maciej Piechotka meant in the comments - the numbering seems to have changed in the version of the standard he's been using.)

    So, when I say float a = some_double_expression;, I have the guarantee that the result of the expression is actually rounded to be representable by a float (invoking UB only if the value is out-of-bounds), and a will refer to that rounded value afterwards.

    An implementation could indeed specify that the result of the rounding is random, and thus break the cases a) and b). Sane implementations won't do that, though.

提交回复
热议问题