How to test for lossless double / integer conversion?

前端未结

关注

 3  1389

夕颜 2021-02-18 13:20

I have one double, and one int64_t. I want to know if they hold exactly the same value, and if converting one type into the other does not lose any information.

My curre

3条回答

误落风尘 (楼主)

2021-02-18 13:55
OP's code has a dependency that can be avoided.

For a successful compare, d must be a whole number and round(d) == d takes care of that. Even d, as a NaN would fail that.

d must be mathematically in the range of [INT64_MIN ... INT64_MAX] and if the if conditions properly insure that, then the final i == (int64_t)d completes the test.

So the question comes down to comparing INT64 limits with the double d.

Let us assume FLT_RADIX == 2, but not necessarily IEEE 754 binary64.

d >= INT64_MIN is not a problem as -INT64_MIN is a power of 2 and exactly converts to a double of the same value, so the >= is exact.

Code would like to do the mathematical d <= INT64_MAX, but that may not work and so a problem. INT64_MAX is a "power of 2 - 1" and may not convert exactly - it depends on if the precision of the double exceeds 63 bits - rendering the compare unclear. A solution is to halve the comparison. d/2 suffers no precision loss and INT64_MAX/2 + 1 converts exactly to a double power-of-2
```
d/2 < (INT64_MAX/2 + 1)
```
[Edit]
```
// or simply
d < ((double)(INT64_MAX/2 + 1))*2
```
Thus if code does not want to rely on the double having less precision than uint64_t. (Something that likely applies with long double) a more portable solution would be
```
int int64EqualsDouble(int64_t i, double d) {
    return (d >= INT64_MIN)
        && (d < ((double)(INT64_MAX/2 + 1))*2)  // (d/2 < (INT64_MAX/2 + 1))
        && (round(d) == d)
        && (i == (int64_t)d);
}
```
Note: No rounding mode issues.

[Edit] Deeper limit explanation

Insuring mathematically, INT64_MIN <= d <= INT64_MAX, can be re-stated as INT64_MIN <= d < (INT64_MAX + 1) as we are dealing with whole numbers. Since the raw application of (double) (INT64_MAX + 1) in code is certainly 0, an alternative, is ((double)(INT64_MAX/2 + 1))*2. This can be extended for rare machines with double of higher powers-of-2 to ((double)(INT64_MAX/FLT_RADIX + 1))*FLT_RADIX. The comparison limits being exact powers-of-2, conversion to double suffers no precision loss and (lo_limit >= d) && (d < hi_limit) is exact, regardless of the precision of the floating point. Note: that a rare floating point with FLT_RADIX == 10 is still a problem.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...