Float/double precision in debug/release modes

后端 未结 5 678
予麋鹿
予麋鹿 2020-11-28 11:22

Do C#/.NET floating point operations differ in precision between debug mode and release mode?

相关标签:
5条回答
  • 2020-11-28 11:54

    In response to Frank Krueger's request above (in comments) for a demonstration of a difference:

    Compile this code in gcc with no optimizations and -mfpmath=387 (I have no reason to think it wouldn't work on other compilers, but I haven't tried it.) Then compile it with no optimizations and -msse -mfpmath=sse.

    The output will differ.

    #include <stdio.h>
    
    int main()
    {
        float e = 0.000000001;
        float f[3] = {33810340466158.90625,276553805316035.1875,10413022032824338432.0};
        f[0] = pow(f[0],2-e); f[1] = pow(f[1],2+e); f[2] = pow(f[2],-2-e);
        printf("%s\n",f);
        return 0;
    }
    
    0 讨论(0)
  • 2020-11-28 12:03

    This is an interesting question, so I did a bit of experimentation. I used this code:

    static void Main (string [] args)
    {
      float
        a = float.MaxValue / 3.0f,
        b = a * a;
    
      if (a * a < b)
      {
        Console.WriteLine ("Less");
      }
      else
      {
        Console.WriteLine ("GreaterEqual");
      }
    }
    

    using DevStudio 2005 and .Net 2. I compiled as both debug and release and examined the output of the compiler:

    Release                                                    Debug
    
        static void Main (string [] args)                        static void Main (string [] args)
        {                                                        {
                                                            00000000  push        ebp  
                                                            00000001  mov         ebp,esp 
                                                            00000003  push        edi  
                                                            00000004  push        esi  
                                                            00000005  push        ebx  
                                                            00000006  sub         esp,3Ch 
                                                            00000009  xor         eax,eax 
                                                            0000000b  mov         dword ptr [ebp-10h],eax 
                                                            0000000e  xor         eax,eax 
                                                            00000010  mov         dword ptr [ebp-1Ch],eax 
                                                            00000013  mov         dword ptr [ebp-3Ch],ecx 
                                                            00000016  cmp         dword ptr ds:[00A2853Ch],0 
                                                            0000001d  je          00000024 
                                                            0000001f  call        793B716F 
                                                            00000024  fldz             
                                                            00000026  fstp        dword ptr [ebp-40h] 
                                                            00000029  fldz             
                                                            0000002b  fstp        dword ptr [ebp-44h] 
                                                            0000002e  xor         esi,esi 
                                                            00000030  nop              
          float                                                      float
            a = float.MaxValue / 3.0f,                                a = float.MaxValue / 3.0f,
    00000000  sub         esp,0Ch                            00000031  mov         dword ptr [ebp-40h],7EAAAAAAh
    00000003  mov         dword ptr [esp],ecx                
    00000006  cmp         dword ptr ds:[00A2853Ch],0        
    0000000d  je          00000014                            
    0000000f  call        793B716F                            
    00000014  fldz                                            
    00000016  fstp        dword ptr [esp+4]                    
    0000001a  fldz                                            
    0000001c  fstp        dword ptr [esp+8]                    
    00000020  mov         dword ptr [esp+4],7EAAAAAAh        
            b = a * a;                                                b = a * a;
    00000028  fld         dword ptr [esp+4]                    00000038  fld         dword ptr [ebp-40h] 
    0000002c  fmul        st,st(0)                            0000003b  fmul        st,st(0) 
    0000002e  fstp        dword ptr [esp+8]                    0000003d  fstp        dword ptr [ebp-44h] 
    
          if (a * a < b)                                          if (a * a < b)
    00000032  fld         dword ptr [esp+4]                    00000040  fld         dword ptr [ebp-40h] 
    00000036  fmul        st,st(0)                            00000043  fmul        st,st(0) 
    00000038  fld         dword ptr [esp+8]                    00000045  fld         dword ptr [ebp-44h] 
    0000003c  fcomip      st,st(1)                            00000048  fcomip      st,st(1) 
    0000003e  fstp        st(0)                                0000004a  fstp        st(0) 
    00000040  jp          00000054                            0000004c  jp          00000052 
    00000042  jbe         00000054                            0000004e  ja          00000056 
                                                            00000050  jmp         00000052 
                                                            00000052  xor         eax,eax 
                                                            00000054  jmp         0000005B 
                                                            00000056  mov         eax,1 
                                                            0000005b  test        eax,eax 
                                                            0000005d  sete        al   
                                                            00000060  movzx       eax,al 
                                                            00000063  mov         esi,eax 
                                                            00000065  test        esi,esi 
                                                            00000067  jne         0000007A 
          {                                                          {
            Console.WriteLine ("Less");                        00000069  nop              
    00000044  mov         ecx,dword ptr ds:[0239307Ch]                Console.WriteLine ("Less");
    0000004a  call        78678B7C                            0000006a  mov         ecx,dword ptr ds:[0239307Ch] 
    0000004f  nop                                            00000070  call        78678B7C 
    00000050  add         esp,0Ch                            00000075  nop              
    00000053  ret                                                  }
          }                                                    00000076  nop              
          else                                                00000077  nop              
          {                                                    00000078  jmp         00000088 
            Console.WriteLine ("GreaterEqual");                      else
    00000054  mov         ecx,dword ptr ds:[02393080h]              {
    0000005a  call        78678B7C                            0000007a  nop              
          }                                                            Console.WriteLine ("GreaterEqual");
        }                                                    0000007b  mov         ecx,dword ptr ds:[02393080h] 
                                                            00000081  call        78678B7C 
                                                            00000086  nop              
                                                                  }
    

    What the above shows is that the floating point code is the same for both debug and release, the compiler is choosing consistency over optimisation. Although the program produces the wrong result (a * a is not less than b) it is the same regardless of the debug/release mode.

    Now, the Intel IA32 FPU has eight floating point registers, you would think that the compiler would use the registers to store values when optimising rather than writing to memory, thus improving the performance, something along the lines of:

    fld         dword ptr [a] ; precomputed value stored in ram == float.MaxValue / 3.0f
    fmul        st,st(0) ; b = a * a
    ; no store to ram, keep b in FPU
    fld         dword ptr [a]
    fmul        st,st(0)
    fcomi       st,st(0) ; a*a compared to b
    

    but this would execute differently to the debug version (in this case, display the correct result). However, changing the behaviour of the program depending on the build options is a very bad thing.

    FPU code is one area where hand crafting the code can significantly out-perform the compiler, but you do need to get your head around the way the FPU works.

    0 讨论(0)
  • 2020-11-28 12:04

    Here's a simple example where results not only differ between debug and release mode, but the way by which they do so depend on whether one uses x86 or x84 as a platform:

    Single f1 = 0.00000000002f;
    Single f2 = 1 / f1;
    Double d = f2;
    Console.WriteLine(d);
    

    This writes the following results:

                Debug       Release
    x86   49999998976   50000000199,7901
    x64   49999998976   49999998976
    

    A quick look at the disassembly (Debug -> Windows -> Disassembly in Visual Studio) gives some hints about what's going on here. For the x86 case:

    Debug                                       Release
    mov         dword ptr [ebp-40h],2DAFEBFFh | mov         dword ptr [ebp-4],2DAFEBFFh  
    fld         dword ptr [ebp-40h]           | fld         dword ptr [ebp-4]   
    fld1                                      | fld1
    fdivrp      st(1),st                      | fdivrp      st(1),st
    fstp        dword ptr [ebp-44h]           |
    fld         dword ptr [ebp-44h]           |
    fstp        qword ptr [ebp-4Ch]           |
    fld         qword ptr [ebp-4Ch]           |
    sub         esp,8                         | sub         esp,8 
    fstp        qword ptr [esp]               | fstp        qword ptr [esp]
    call        6B9783BC                      | call        6B9783BC
    

    In particular, we see that a bunch of seemingly redundant "store the value from the floating point register in memory, then immediately load it back from memory into the floating point register" have been optimized away in release mode. However, the two instructions

    fstp        dword ptr [ebp-44h]  
    fld         dword ptr [ebp-44h]
    

    are enough to change the value in the x87 register from +5.0000000199790138e+0010 to +4.9999998976000000e+0010 as one may verify by stepping through the disassembly and investigating the values of the relevant registers (Debug -> Windows -> Registers, then right click and check "Floating point").

    The story for x64 is wildly different. We still see the same optimization removing a few instructions, but this time around, everything relies on SSE with its 128-bit registers and dedicated instruction set:

    Debug                                        Release
    vmovss      xmm0,dword ptr [7FF7D0E104F8h] | vmovss      xmm0,dword ptr [7FF7D0E304C8h]  
    vmovss      dword ptr [rbp+34h],xmm0       | vmovss      dword ptr [rbp-4],xmm0 
    vmovss      xmm0,dword ptr [7FF7D0E104FCh] | vmovss      xmm0,dword ptr [7FF7D0E304CCh]
    vdivss      xmm0,xmm0,dword ptr [rbp+34h]  | vdivss      xmm0,xmm0,dword ptr [rbp-4]
    vmovss      dword ptr [rbp+30h],xmm0       |
    vcvtss2sd   xmm0,xmm0,dword ptr [rbp+30h]  | vcvtss2sd   xmm0,xmm0,xmm0 
    vmovsd      qword ptr [rbp+28h],xmm0       |
    vmovsd      xmm0,qword ptr [rbp+28h]       |
    call        00007FF81C9343F0               | call        00007FF81C9343F0 
    

    Here, because the SSE unit avoids using higher precision than single precision internally (while the x87 unit does), we end up with the "single precision-ish" result of the x86 case regardless of optimizations. Indeed, one finds (after enabling the SSE registers in the Visual Studio Registers overview) that after vdivss, XMM0 contains 0000000000000000-00000000513A43B7 which is exactly the 49999998976 from before.

    Both of the discrepancies bit me in practice. Besides illustrating that one should never compare equality of floating points, the example also shows that there's still room for assembly debugging in a high-level language such as C#, the moment floating points show up.

    0 讨论(0)
  • 2020-11-28 12:05

    They can indeed be different. According to the CLR ECMA specification:

    Storage locations for floating-point numbers (statics, array elements, and fields of classes) are of fixed size. The supported storage sizes are float32 and float64. Everywhere else (on the evaluation stack, as arguments, as return types, and as local variables) floating-point numbers are represented using an internal floating-point type. In each such instance, the nominal type of the variable or expression is either R4 or R8, but its value can be represented internally with additional range and/or precision. The size of the internal floating-point representation is implementation-dependent, can vary, and shall have precision at least as great as that of the variable or expression being represented. An implicit widening conversion to the internal representation from float32 or float64 is performed when those types are loaded from storage. The internal representation is typically the native size for the hardware, or as required for efficient implementation of an operation.

    What this basically means is that the following comparison may or may not be equal:

    class Foo
    {
      double _v = ...;
    
      void Bar()
      {
        double v = _v;
    
        if( v == _v )
        {
          // Code may or may not execute here.
          // _v is 64-bit.
          // v could be either 64-bit (debug) or 80-bit (release) or something else (future?).
        }
      }
    }
    

    Take-home message: never check floating values for equality.

    0 讨论(0)
  • 2020-11-28 12:16

    In fact, they may differ if debug mode uses the x87 FPU and release mode uses SSE for float-ops.

    0 讨论(0)
提交回复
热议问题