问题
When I run the following code in VC++ 2013 (32-bit, no optimizations):
#include <cmath>
#include <iostream>
#include <limits>
double mulpow10(double const value, int const pow10)
{
static double const table[] =
{
1E+000, 1E+001, 1E+002, 1E+003, 1E+004, 1E+005, 1E+006, 1E+007,
1E+008, 1E+009, 1E+010, 1E+011, 1E+012, 1E+013, 1E+014, 1E+015,
1E+016, 1E+017, 1E+018, 1E+019,
};
return pow10 < 0 ? value / table[-pow10] : value * table[+pow10];
}
int main(void)
{
double d = 9710908999.008999;
int j_max = std::numeric_limits<double>::max_digits10;
while (j_max > 0 && (
static_cast<double>(
static_cast<unsigned long long>(
mulpow10(d, j_max))) != mulpow10(d, j_max)))
{
--j_max;
}
double x = std::floor(d * 1.0E9);
unsigned long long y1 = x;
unsigned long long y2 = std::floor(d * 1.0E9);
std::cout
<< "x == " << x << std::endl
<< "y1 == " << y1 << std::endl
<< "y2 == " << y2 << std::endl;
}
I get
x == 9.7109089990089994e+018
y1 == 9710908999008999424
y2 == 9223372036854775808
in the debugger.
I'm mindblown. Can someone please explain to me how the heck y1
and y2
have different values?
Update:
This only seems to happen under /Arch:SSE2
or /Arch:AVX
, not /Arch:IA32
or /Arch:SSE
.
回答1:
You are converting out-of-range double
values to unsigned long long
. This is not allowed in standard C++, and Visual C++ appears to treat it really badly in SSE2 mode: it leaves a number on the FPU stack, eventually overflowing it and making later code that uses the FPU fail in really interesting ways.
A reduced sample is
double d = 1E20;
unsigned long long ull[] = { d, d, d, d, d, d, d, d };
if (floor(d) != floor(d)) abort();
This aborts if ull
has eight or more elements, but passes if it has up to seven.
The solution is not to convert floating point values to an integer type unless you know that the value is in range.
4.9 Floating-integral conversions [conv.fpint]
A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type. [ Note: If the destination type is
bool
, see 4.12. -- end note ]
The rule that out-of-range values wrap when converted to an unsigned type only applies if the value as already of some integer type.
For whatever it's worth, though, this doesn't seem like it's intentional, so even though the standard permits this behaviour, it may still be worth reporting this as a bug.
回答2:
9223372036854775808
is 0x8000000000000000
; that is, it is equal to INT64_MIN
cast to uint64_t
.
It looks like your compiler is casting the return value of floor
to long long
and then casting that result to unsigned long long
.
Note that it is quite usual for overflow in floating-point-to-integral conversion to yield the least representable value (e.g. cvttsd2siq
on x86-64):
When a conversion is inexact, a truncated result is returned. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.
(this is from the doubleword documentation, but the quadword behaviour is the same.)
回答3:
Hypothesis: It is a bug. The compiler converts double
to unsigned long long
correctly but converts extended-precision floating-point (possibly long double
) to unsigned long long
incorrectly. Details:
double x = std::floor(9710908999.0089989 * 1.0E9);
This computes the value on the right-hand side and stores it in x
. The value on the right-hand side might be computed with extended precision, but it is, as the rules of C++ require, converted to double
when stored in x
. The exact mathematical value would be 9710908999008998870, but rounding it to the double
format produces 9710908999008999424.
unsigned long long y1 = x;
This converts the double
value in x
to unsigned long long
, producing the expected 9710908999008999424.
unsigned long long y2 = std::floor(9710908999.0089989 * 1.0E9);
This computes the value on the right-hand side using extended precision, producing 9710908999008998870. When the extended-precision value is converted to unsigned long long
, there is a bug, producing 263 (9223372036854775808). This value is likely the “out of range” error value produced by an instruction that converts the extended-precision format to a 64-bit integer. The compiler has used an incorrect instruction sequence to convert its extended-precision format to an unsigned long long
.
回答4:
You have casted y1 as a double before casting it again to a long. the value of x isn't the "floor" value but a rounded value for floor.
Same logic would apply with casting integers and floats. float x = (float)((int) 1.5) will give a different value to float x = 1.5
来源:https://stackoverflow.com/questions/21478945/bizarre-floating-point-behavior-with-vs-without-extra-variables-why