How to test for lossless double / integer conversion?

前端 未结 3 1378
夕颜
夕颜 2021-02-18 13:20

I have one double, and one int64_t. I want to know if they hold exactly the same value, and if converting one type into the other does not lose any information.

My curre

3条回答
  •  误落风尘
    2021-02-18 13:55

    OP's code has a dependency that can be avoided.

    For a successful compare, d must be a whole number and round(d) == d takes care of that. Even d, as a NaN would fail that.

    d must be mathematically in the range of [INT64_MIN ... INT64_MAX] and if the if conditions properly insure that, then the final i == (int64_t)d completes the test.

    So the question comes down to comparing INT64 limits with the double d.

    Let us assume FLT_RADIX == 2, but not necessarily IEEE 754 binary64.

    d >= INT64_MIN is not a problem as -INT64_MIN is a power of 2 and exactly converts to a double of the same value, so the >= is exact.

    Code would like to do the mathematical d <= INT64_MAX, but that may not work and so a problem. INT64_MAX is a "power of 2 - 1" and may not convert exactly - it depends on if the precision of the double exceeds 63 bits - rendering the compare unclear. A solution is to halve the comparison. d/2 suffers no precision loss and INT64_MAX/2 + 1 converts exactly to a double power-of-2

    d/2 < (INT64_MAX/2 + 1)
    

    [Edit]

    // or simply
    d < ((double)(INT64_MAX/2 + 1))*2
    

    Thus if code does not want to rely on the double having less precision than uint64_t. (Something that likely applies with long double) a more portable solution would be

    int int64EqualsDouble(int64_t i, double d) {
        return (d >= INT64_MIN)
            && (d < ((double)(INT64_MAX/2 + 1))*2)  // (d/2 < (INT64_MAX/2 + 1))
            && (round(d) == d)
            && (i == (int64_t)d);
    }
    

    Note: No rounding mode issues.

    [Edit] Deeper limit explanation

    Insuring mathematically, INT64_MIN <= d <= INT64_MAX, can be re-stated as INT64_MIN <= d < (INT64_MAX + 1) as we are dealing with whole numbers. Since the raw application of (double) (INT64_MAX + 1) in code is certainly 0, an alternative, is ((double)(INT64_MAX/2 + 1))*2. This can be extended for rare machines with double of higher powers-of-2 to ((double)(INT64_MAX/FLT_RADIX + 1))*FLT_RADIX. The comparison limits being exact powers-of-2, conversion to double suffers no precision loss and (lo_limit >= d) && (d < hi_limit) is exact, regardless of the precision of the floating point. Note: that a rare floating point with FLT_RADIX == 10 is still a problem.

提交回复
热议问题