Inconsistent behaviour in identical code

前端未结

关注

 3  916

An error trap trips after running a physics simulation for about 20 minutes. Realising this would be a pain to debug, I duplicated the relevant subroutine in a new project, and

相关标签:

3条回答

不要未来只要你来

2021-01-22 12:22
I can confirm the problematic output you’re getting can be caused by a change in x87 precision. The precision value is stored in x87 FPU control register, and when changed, the value persists through the lifetime of your thread, affecting all x87 code running on the thread.

Apparently, some other component of your huge program (or an external library you use) sometimes changes mantissa length from 53 bits (which is the default) to 64 bits (which means use the full precision of these 80 bit x87 registers).

The best way to fix, switch your compiler from x87 to SSE2 target. SSE always use either 32 or 64-bit floats (depending on the instructions used), it doesn’t have 80-bit registers at all. Even your 2003 Athlon 64 already supports that instruction set. As a side effect, your code will become somewhat faster.

Update: If you don’t want to switch to SSE2, you can reset the precision to whatever value you like. Here’s how to do that in Visual C++:
```
#include <float.h>
uint32_t prev;
_controlfp_s( &prev, _PC_53, _MCW_PC ); // or _PC_64 for 80-bit
```
For GCC, it’s something like this (untested)
```
#include <fpu_control.h>
#define _FPU_PRECISION ( _FPU_SINGLE | _FPU_DOUBLE | _FPU_EXTENDED )
fpu_control_t prev, curr;
_FPU_GETCW( prev );
curr = ( prev & ~_FPU_PRECISION ) | _FPU_DOUBLE; // or _FPU_EXTENDED for 80 bit
_FPU_SETCW( curr );
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
天涯浪人

2021-01-22 12:31
```
*(long long int *)(&Collision_Axis_Vector.XYZ[0]) = 4594681439063077250;
```
and all similar lines introduce Undefined Behavior in the program because they violate the Strict Aliasing rule:

you access a double value as long long int
0 讨论(0)
发布评论:

提交评论
- 加载中...
遇见更好的自我

2021-01-22 12:35
Compiled your sample with Visual C++. I can confirm the output is slightly different from what you see in the debugger, here’s mine:
```
CAV: 4594681439063077250, 4603161398996347097, 4605548671330989714
T1: -4626277815076045984, -4637257536736295424, 4589609575355367200
CP: 4589838838395290724, -4627337114727508684, 4592984408164162561
```
I don’t know for sure what might cause the difference, but here’s an idea.

Since you’ve already looked at the machine code, what are you compiling into, legacy x87 or SSE? I presume it’s SSE, most compilers target this by default, for years already. If you pass -march native to gcc, very likely your CPU has some FMA instruction set (AMD since late 2011, Intel since 2013). Therefore, your GCC compiler used these _mm_fmadd_pd / _mm_fmsub_pd intrinsics, causing your 1-bit difference.

However, that’s all theory. My advice is, instead of trying to find out what caused that difference, you should fix your outer code.

It’s a bad idea to trap to debugger as the result of condition like this.

The numerical difference is very small. That’s the least significant bit in a 52-bit mantissa, i.e. the error is just 2^(-52).

Even if you’ll find out what caused that, disable e.g. FMA or some other thing that caused the issue, doing that is fragile, i.e. you’re going to support that hack through the lifetime of the project. You’ll upgrade your compiler, or the compiler will decide to optimize your code differently, or even you’ll upgrade the CPU — your code may break in the similar way.

Better approach, just stop comparing floating point numbers for exact equality. Instead, calculate e.g. absolute difference, and compare that with a small enough constant.
0 讨论(0)
发布评论:

提交评论
- 加载中...