I have one double, and one int64_t. I want to know if they hold exactly the same value, and if converting one type into the other does not lose any information.
My curre
In addition to Pascal Cuoq's elaborate answer, and given the extra context you give in comments, I would add a test for negative zeros. You should preserve negative zeros unless you have good reasons not to. You need a specific test to avoid converting them to (int64_t)0
. With your current proposal, negative zeros will pass your test, get stored as int64_t
and read back as positive zeros.
I am not sure what is the most efficient way to test them, maybe this:
int int64EqualsDouble(int64_t i, double d) {
return (d >= INT64_MIN)
&& (d < INT64_MAX)
&& (round(d) == d)
&& (i == (int64_t)d
&& (!signbit(d) || d != 0.0);
}
Yes, your solution works correctly because it was designed to do so, because int64_t
is represented in two's complement by definition (C99 7.18.1.1:1), on platforms that use something resembling binary IEEE 754 double-precision for the double
type. It is basically the same as this one.
Under these conditions:
d < INT64_MAX
is correct because it is equivalent to d < (double) INT64_MAX
and in the conversion to double, the number INT64_MAX
, equal to 0x7fffffffffffffff, rounds up. Thus you want d
to be strictly less than the resulting double
to avoid triggering UB when executing (int64_t)d
.
On the other hand, INT64_MIN
, being -0x8000000000000000, is exactly representable, meaning that a double
that is equal to (double)INT64_MIN
can be equal to some int64_t
and should not be excluded (and such a double
can be converted to int64_t
without triggering undefined behavior)
It goes without saying that since we have specifically used the assumptions about 2's complement for integers and binary floating-point, the correctness of the code is not guaranteed by this reasoning on platforms that differ. Take a platform with binary 64-bit floating-point and a 64-bit 1's complement integer type T
. On that platform T_MIN
is -0x7fffffffffffffff
. The conversion to double
of that number rounds down, resulting in -0x1.0p63
. On that platform, using your program as it is written, using -0x1.0p63
for d
makes the first three conditions true, resulting in undefined behavior in (T)d
, because overflow in the conversion from integer to floating-point is undefined behavior.
If you have access to full IEEE 754 features, there is a shorter solution:
#include <fenv.h>
…
#pragma STDC FENV_ACCESS ON
feclearexcept(FE_INEXACT), f == i && !fetestexcept(FE_INEXACT)
This solution takes advantage of the conversion from integer to floating-point setting the INEXACT flag iff the conversion is inexact (that is, if i
is not representable exactly as a double
).
The INEXACT flag remains unset and f
is equal to (double)i
if and only if f
and i
represent the same mathematical value in their respective types.
This approach requires the compiler to have been warned that the code accesses the FPU's state, normally with #pragma STDC FENV_ACCESS on
but that's typically not supported and you have to use a compilation flag instead.
OP's code has a dependency that can be avoided.
For a successful compare, d
must be a whole number and round(d) == d
takes care of that. Even d
, as a NaN would fail that.
d
must be mathematically in the range of [INT64_MIN
... INT64_MAX
] and if the if
conditions properly insure that, then the final i == (int64_t)d
completes the test.
So the question comes down to comparing INT64
limits with the double
d
.
Let us assume FLT_RADIX == 2
, but not necessarily IEEE 754 binary64.
d >= INT64_MIN
is not a problem as -INT64_MIN
is a power of 2 and exactly converts to a double
of the same value, so the >=
is exact.
Code would like to do the mathematical d <= INT64_MAX
, but that may not work and so a problem. INT64_MAX
is a "power of 2 - 1" and may not convert exactly - it depends on if the precision of the double
exceeds 63 bits - rendering the compare unclear. A solution is to halve the comparison. d/2
suffers no precision loss and INT64_MAX/2 + 1
converts exactly to a double
power-of-2
d/2 < (INT64_MAX/2 + 1)
[Edit]
// or simply
d < ((double)(INT64_MAX/2 + 1))*2
Thus if code does not want to rely on the double
having less precision than uint64_t
. (Something that likely applies with long double
) a more portable solution would be
int int64EqualsDouble(int64_t i, double d) {
return (d >= INT64_MIN)
&& (d < ((double)(INT64_MAX/2 + 1))*2) // (d/2 < (INT64_MAX/2 + 1))
&& (round(d) == d)
&& (i == (int64_t)d);
}
Note: No rounding mode issues.
[Edit] Deeper limit explanation
Insuring mathematically, INT64_MIN <= d <= INT64_MAX
, can be re-stated as INT64_MIN <= d < (INT64_MAX + 1)
as we are dealing with whole numbers. Since the raw application of (double) (INT64_MAX + 1)
in code is certainly 0, an alternative, is ((double)(INT64_MAX/2 + 1))*2
. This can be extended for rare machines with double
of higher powers-of-2 to ((double)(INT64_MAX/FLT_RADIX + 1))*FLT_RADIX
. The comparison limits being exact powers-of-2, conversion to double
suffers no precision loss and (lo_limit >= d) && (d < hi_limit)
is exact, regardless of the precision of the floating point. Note: that a rare floating point with FLT_RADIX == 10
is still a problem.