How to test for lossless double / integer conversion?

前端未结

关注

 3  1373

夕颜 2021-02-18 13:20

I have one double, and one int64_t. I want to know if they hold exactly the same value, and if converting one type into the other does not lose any information.

My curre

3条回答

醉梦人生 (楼主)

2021-02-18 13:41
Yes, your solution works correctly because it was designed to do so, because int64_t is represented in two's complement by definition (C99 7.18.1.1:1), on platforms that use something resembling binary IEEE 754 double-precision for the double type. It is basically the same as this one.

Under these conditions:
- d < INT64_MAX is correct because it is equivalent to d < (double) INT64_MAX and in the conversion to double, the number INT64_MAX, equal to 0x7fffffffffffffff, rounds up. Thus you want d to be strictly less than the resulting double to avoid triggering UB when executing (int64_t)d.
- On the other hand, INT64_MIN, being -0x8000000000000000, is exactly representable, meaning that a double that is equal to (double)INT64_MIN can be equal to some int64_t and should not be excluded (and such a double can be converted to int64_t without triggering undefined behavior)
It goes without saying that since we have specifically used the assumptions about 2's complement for integers and binary floating-point, the correctness of the code is not guaranteed by this reasoning on platforms that differ. Take a platform with binary 64-bit floating-point and a 64-bit 1's complement integer type T. On that platform T_MIN is -0x7fffffffffffffff. The conversion to double of that number rounds down, resulting in -0x1.0p63. On that platform, using your program as it is written, using -0x1.0p63 for d makes the first three conditions true, resulting in undefined behavior in (T)d, because overflow in the conversion from integer to floating-point is undefined behavior.

If you have access to full IEEE 754 features, there is a shorter solution:
```
#include 
…
#pragma STDC FENV_ACCESS ON
feclearexcept(FE_INEXACT), f == i && !fetestexcept(FE_INEXACT)
```
This solution takes advantage of the conversion from integer to floating-point setting the INEXACT flag iff the conversion is inexact (that is, if i is not representable exactly as a double).

The INEXACT flag remains unset and f is equal to (double)i if and only if f and i represent the same mathematical value in their respective types.

This approach requires the compiler to have been warned that the code accesses the FPU's state, normally with #pragma STDC FENV_ACCESS on but that's typically not supported and you have to use a compilation flag instead.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...