问题
We are running some code in both VisualStudio process context (x86 context) and out of VisualStudio context (x64 context). I notice the following code provides a different result in both context (100000000000 in x86 and 99999997952 in x64)
float val = 1000f;
val = val * val;
return (ulong)(val * 100000.0f);
We need to obtain a ulong value from a float value in a reliable way, no matter the context and no matter the ulong value, it is just for hashing purpose. I tested this code in both x64 and x86 context and indeed obtained the same result, it looks reliable:
float operandFloat = (float)obj;
byte[] bytes = BitConverter.GetBytes(operandFloat);
Debug.Assert(bytes.Length == 4);
uint @uint = BitConverter.ToUInt32(bytes, 0);
return (ulong)@uint;
Is this code reliable?
回答1:
As others have speculated in the comments, the difference you're observing is the result of differential precision when doing floating-point arithmetic, arising out of a difference between how the 32-bit and 64-bit builds perform these operations.
Your code is translated by the 32-bit (x86) JIT compiler into the following object code:
fld qword ptr ds:[0E63308h] ; Load constant 1.0e+11 onto top of FPU stack.
sub esp, 8 ; Allocate 8 bytes of stack space.
fstp qword ptr [esp] ; Pop top of FPU stack, putting 1.0e+11 into
; the allocated stack space at [esp].
call 73792C70 ; Call internal helper method that converts the
; double-precision floating-point value stored at [esp]
; into a 64-bit integer, and returns it in edx:eax.
; At this point, edx:eax == 100000000000.
Notice that the optimizer has folded your arithmetic computation ((1000f * 1000f) * 100000f
) to the constant 1.0e+11. It has stored this constant in the binary's data segment, and loads it onto the top of the x87 floating-point stack (the fld
instruction). The code then allocates 8 bytes of stack space (enough for a 64-bit double-precision floating-point value) by sub
tracting the stack pointer (esp
). The fstp
instruction pops the value off the top of the x87 floating-point stack, and stores it in its memory operand. In this case, it stores it into the 8 bytes that we just allocated on the stack. All of this shuffling is rather pointless: it could have just loaded the floating-point constant 1.0e+11 directly into memory, by-passing the trip through the x87 FPU, but the JIT optimizer isn't perfect. Finally, the JIT emitted code to call an internal helper function that converts the double-precision floating-point value stored in memory (1.0e+11) into a 64-bit integer. The 64-bit integer result is returned in the register pair edx:eax
, as is customary for 32-bit Windows calling conventions. When this code completes, edx:eax
contains the 64-bit integer value 100000000000, or 1.0e+11, exactly as you would expect.
(Hopefully the terminology here is not too confusing. Note that there are two different "stacks". The x87 FPU has a series of registers, which are accessed like a stack. I refer to this as the FPU stack. Then, there is the stack with which you are probably familiar, the one stored in main memory and accessed via the stack pointer, esp
.)
However, things are done a bit differently by the 64-bit (x86-64) JIT compiler. The big difference here is that 64-bit targets always use SSE2 instructions for floating-point operations, since all chips that support AMD64 also support SSE2, and SSE2 is more efficient and more flexible than the old x87 FPU. Specifically, the 64-bit JIT translates your code into the following:
movsd xmm0, mmword ptr [7FFF7B1A44D8h] ; Load constant into XMM0 register.
call 00007FFFDAC253B0 ; Call internal helper method that converts the
; floating-point value in XMM0 into a 64-bit int
; that is returned in RAX.
Things immediately go wrong here, because the constant value being loaded by the first instruction is 0x42374876E0000000, which is the binary floating-point representation of 99999997952.0. The problem is not the helper function that is doing the conversion to a 64-bit integer. Instead, it is the JIT compiler itself, specifically the optimizer routine that is pre-computing the constant.
To gain some insight into how that goes wrong, we'll turn off JIT optimization and see what the code looks like:
movss xmm0, dword ptr [7FFF7B1A4500h]
movss dword ptr [rbp-4], xmm0
movss xmm0, dword ptr [rbp-4]
movss xmm1, dword ptr [rbp-4]
mulss xmm0, xmm1
mulss xmm0, dword ptr [7FFF7B1A4504h]
cvtss2sd xmm0, xmm0
call 00007FFFDAC253B0
The first movss
instruction loads a single-precision floating-point constant from memory into the xmm0
register. This time, however, that constant is 0x447A0000, which is the precise binary representation of 1000—the initial float
value from your code.
The second movss
instruction turns right around and stores this value from the xmm0
register into memory, and the third movss
instruction re-loads the just-stored value from memory back into the xmm0
register. (Told you this was unoptimized code!) It also loads a second copy of that same value from memory into the xmm1
register, and then multiplies (mulss
) the two single-precision values in xmm0
and xmm1
together. This is the literal translation of your val = val * val
code. The result of this operation (which ends up in xmm0
) is 0x49742400, or 1.0e+6, precisely as you would expect.
The second mulss
instruction performs the val * 100000.0f
operation. It implicitly loads the single-precision floating-point constant 1.0e+5 and multiplies it with the value in xmm0
(which, recall, is 1.0e+6). Unfortunately, the result of this operation is not what you would expect. Instead of 1.0e+11, it is actually 9.9999998e+10. Why? Because 1.0e+11 cannot be precisely represented as a single-precision floating-point value. The closest representation is 0x51BA43B7, or 9.9999998e+10.
Finally, the cvtss2sd
instruction performs an in-place conversion of the (wrong!) scalar single-precision floating-point value in xmm0
to a scalar double-precision floating-point value. In a comment to the question, Neitsa suggested that this might be the source of the problem. In fact, as we have seen, the source of the problem is the previous instruction, the one that does the multiplication. The cvtss2sd
just converts an already imprecise single-precision floating-point representation (0x51BA43B7) to an imprecise double-precision floating point representation: 0x42374876E0000000, or 99999997952.0.
And this is precisely the series of operations performed by the JIT compiler to produce the initial double-precision floating-point constant that is loaded into the xmm0
register in the optimized code.
Although I have been implying throughout this answer that the JIT compiler is to blame, that is not the case at all! If you had compiled the identical code in C or C++ while targeting the SSE2 instruction set, you would have gotten exactly the same imprecise result: 99999997952.0. The JIT compiler is performing just as one would expect it to—if, that is, one's expectations are correctly calibrated to the imprecision of floating-point operations!
So, what is the moral of this story? There are two of them. First, floating-point operations are tricky and there is a lot to know about them. Second, in light of this, always use the most precision that you have available when doing floating-point arithmetic!
The 32-bit code is producing the correct result because it is operating with double-precision floating-point values. With 64 bits to play with, a precise representation of 1.0e+11 is possible.
The 64-bit code is producing the incorrect result because it is using single-precision floating-point values. With only 32 bits to play with, a precise representation of 1.0e+11 is not possible.
You would not have had this problem if you had used the double
type to begin with:
double val = 1000.0;
val = val * val;
return (ulong)(val * 100000.0);
This ensures the correct result on all architectures, with no need for ugly, non-portable bit-manipulation hacks like those suggested in the question. (Which still cannot ensure the correct result, since it doesn't solve the root of the problem, namely that your desired result cannot be directly represented in a 32-bit single-precision float
.)
Even if you have to take input as a single-precision float
, convert it immediately into a double
, and do all of your subsequent arithmetic manipulations in the double-precision space. That would still have solved this problem, since the initial value of 1000 can be precisely represented as a float
.
来源:https://stackoverflow.com/questions/41225712/float-arithmetic-and-x86-and-x64-context