My programe runs both in linux and windows, I have to make sure the floating point arithmetic get the same result in different OS.
Here is the code:
for
Use /fp:strict
on Windows to tell the compiler to produce code that strictly follows IEEE 754, and gcc -msse2 -mfpmath=sse
on Linux to obtain the same behavior there.
The reasons for the differences you are seeing have been discussed in spots on StackOverflow, but the best survey is David Monniaux's article.
The assembly instructions I obtain when compiling with gcc -msse2 -mpfmath=sse
are as follow. Instructions cvtsi2ssq
, divss
, mulss
, addss
are the correct instructions to use, and they result in a program where p_value
contains at one point 42d5d1ec
.
.globl _main
.align 4, 0x90
_main: ## @main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
subq $32, %rsp
movl $0, -4(%rbp)
movl $0, -8(%rbp)
LBB0_1: ## =>This Inner Loop Header: Depth=1
cmpl $100000, -8(%rbp) ## imm = 0x186A0
jge LBB0_4
## BB#2: ## in Loop: Header=BB0_1 Depth=1
movq _p_value@GOTPCREL(%rip), %rax
movabsq $100, %rcx
cvtsi2ssq %rcx, %xmm0
movss LCPI0_0(%rip), %xmm1
movabsq $10, %rcx
cvtsi2ssq %rcx, %xmm2
cvtsi2ss -8(%rbp), %xmm3
divss %xmm3, %xmm2
movss %xmm2, -12(%rbp)
cvtsi2ss -8(%rbp), %xmm2
mulss %xmm2, %xmm1
addss %xmm0, %xmm1
movss %xmm1, (%rax)
movl (%rax), %edx
movl %edx, -16(%rbp)
leaq L_.str(%rip), %rdi
movl -16(%rbp), %esi
movb $0, %al
callq _printf
movl %eax, -20(%rbp) ## 4-byte Spill
## BB#3: ## in Loop: Header=BB0_1 Depth=1
movl -8(%rbp), %eax
addl $1, %eax
movl %eax, -8(%rbp)
jmp LBB0_1
LBB0_4:
movl -4(%rbp), %eax
addq $32, %rsp
popq %rbp
ret
The precise results of your code are not fully defined by the IEEE and C/C++ standards. That is the source of the problem.
The main problem is that while all of your inputs are floats that does not mean that the calculation must be done at float precision. The compiler can decide to use double-precision for all intermediate values if it wants to. This tends to happen automatically when compiling for x87 FPUs, but the compiler (VC++ 2010, for instance) can do this expansion explicitly if it wants to even when compiling SSE code.
This is not well understood. I shared my understanding of this a few years ago here:
http://randomascii.wordpress.com/2012/03/21/intermediate-floating-point-precision/
Some compilers let you specify the intermediate precision. If you can force all compilers to use the same intermediate precision then your results should be consistent.