Double.IsNaN test 100 times faster?

后端 未结 3 1504
无人及你
无人及你 2021-01-01 09:45

I found this in the .NET Source Code: It claims to be 100 times faster than System.Double.IsNaN. Is there a reason to not use this function instead of Sys

相关标签:
3条回答
  • 2021-01-01 10:01

    I call shenanigans. The "fast" version has a considerably larger number of ops and even performs more reads from memory, (stack, so in L1 but still slower than registers).

    00007FFAC53D3D01  movups      xmmword ptr [rsp+8],xmm0  
    00007FFAC53D3D06  sub         rsp,48h  
    00007FFAC53D3D0A  mov         qword ptr [rsp+20h],0  
    00007FFAC53D3D13  mov         qword ptr [rsp+28h],0  
    00007FFAC53D3D1C  mov         qword ptr [rsp+30h],0  
    00007FFAC53D3D25  mov         rax,7FFAC5423D40h  
    00007FFAC53D3D2F  mov         eax,dword ptr [rax]  
    00007FFAC53D3D31  test        eax,eax  
    00007FFAC53D3D33  je          00007FFAC53D3D3A  
    00007FFAC53D3D35  call        00007FFB24EE39F0  
    00007FFAC53D3D3A  mov         r8d,8  
    00007FFAC53D3D40  xor         edx,edx  
    00007FFAC53D3D42  lea         rcx,[rsp+20h]  
    00007FFAC53D3D47  call        00007FFB24A21680  
                t.DoubleValue = value;
    00007FFAC53D3D4C  movsd       xmm5,mmword ptr [rsp+50h]  
    00007FFAC53D3D52  movsd       mmword ptr [rsp+20h],xmm5  
    
                UInt64 exp = t.UintValue & 0xfff0000000000000;
    00007FFAC53D3D58  mov         rax,qword ptr [rsp+20h]  
    00007FFAC53D3D5D  mov         rcx,0FFF0000000000000h  
    00007FFAC53D3D67  and         rax,rcx  
    00007FFAC53D3D6A  mov         qword ptr [rsp+28h],rax  
                UInt64 man = t.UintValue & 0x000fffffffffffff;
    00007FFAC53D3D6F  mov         rax,qword ptr [rsp+20h]  
    00007FFAC53D3D74  mov         rcx,0FFFFFFFFFFFFFh  
    00007FFAC53D3D7E  and         rax,rcx  
    00007FFAC53D3D81  mov         qword ptr [rsp+30h],rax  
    
                return (exp == 0x7ff0000000000000 || exp == 0xfff0000000000000) && (man != 0);
    00007FFAC53D3D86  mov         rax,7FF0000000000000h  
    00007FFAC53D3D90  cmp         qword ptr [rsp+28h],rax  
    00007FFAC53D3D95  je          00007FFAC53D3DA8  
    00007FFAC53D3D97  mov         rax,0FFF0000000000000h  
    00007FFAC53D3DA1  cmp         qword ptr [rsp+28h],rax  
    00007FFAC53D3DA6  jne         00007FFAC53D3DBD  
    00007FFAC53D3DA8  xor         eax,eax  
    00007FFAC53D3DAA  cmp         qword ptr [rsp+30h],0  
    00007FFAC53D3DB0  setne       al  
    00007FFAC53D3DB3  mov         dword ptr [rsp+38h],eax  
    00007FFAC53D3DB7  mov         al,byte ptr [rsp+38h]  
    00007FFAC53D3DBB  jmp         00007FFAC53D3DC1  
    00007FFAC53D3DBD  xor         eax,eax  
    00007FFAC53D3DBF  jmp         00007FFAC53D3DC1  
    00007FFAC53D3DC1  nop  
    00007FFAC53D3DC2  add         rsp,48h  
    00007FFAC53D3DC6  ret  
    

    Versus the .NET version:

                return (*(UInt64*)(&d) & 0x7FFFFFFFFFFFFFFFL) > 0x7FF0000000000000L;
    00007FFAC53D3DE0  movsd       mmword ptr [rsp+8],xmm0  
    00007FFAC53D3DE6  sub         rsp,38h  
    00007FFAC53D3DEA  mov         rax,7FFAC5423D40h  
    00007FFAC53D3DF4  mov         eax,dword ptr [rax]  
    00007FFAC53D3DF6  test        eax,eax  
    00007FFAC53D3DF8  je          00007FFAC53D3DFF  
    00007FFAC53D3DFA  call        00007FFB24EE39F0  
    00007FFAC53D3DFF  mov         rdx,qword ptr [rsp+40h]  
    00007FFAC53D3E04  mov         rax,7FFFFFFFFFFFFFFFh  
    00007FFAC53D3E0E  and         rdx,rax  
    00007FFAC53D3E11  xor         ecx,ecx  
    00007FFAC53D3E13  mov         rax,7FF0000000000000h  
    00007FFAC53D3E1D  cmp         rdx,rax  
    00007FFAC53D3E20  seta        cl  
    00007FFAC53D3E23  mov         dword ptr [rsp+20h],ecx  
    00007FFAC53D3E27  movzx       eax,byte ptr [rsp+20h]  
    00007FFAC53D3E2C  jmp         00007FFAC53D3E2E  
    00007FFAC53D3E2E  nop  
    00007FFAC53D3E2F  add         rsp,38h  
    00007FFAC53D3E33  ret  
    
    0 讨论(0)
  • 2021-01-01 10:12

    It claims to be 100 times faster than System.Double.IsNaN

    Yes, that used to be true. You are missing the time-machine to know when this decision was made. Double.IsNaN() didn't used to look like that. From the SSCLI10 source code:

       public static bool IsNaN(double d)
       {
           // Comparisions of a NaN with another number is always false and hence both conditions will be false.
           if (d < 0d || d >= 0d) {
              return false;
           }
           return true;
       }
    

    Which performs very poorly on the FPU in 32-bit code if d is NaN. Just an aspect of the chip design, it is treated as exceptional in the micro-code. The Intel processor manuals say very little about it, other than documenting a processor perf counter that tracks the number of "Floating Point assists" and noting that the micro-code sequencer comes into play for denormals and NaNs, "potentially costing hundreds of cycles". Not otherwise an issue in 64-bit code, it uses SSE2 instructions which don't have this perf hit.

    Some code to play with to see this yourself:

    using System;
    using System.Diagnostics;
    
    class Program {
        static void Main(string[] args) {
            double d = double.NaN;
            for (int test = 0; test < 10; ++test) {
                var sw1 = Stopwatch.StartNew();
                bool result1 = false;
                for (int ix = 0; ix < 1000 * 1000; ++ix) {
                    result1 |= double.IsNaN(d);
                }
                sw1.Stop();
                var sw2 = Stopwatch.StartNew();
                bool result2 = false;
                for (int ix = 0; ix < 1000 * 1000; ++ix) {
                    result2 |= IsNaN(d);
                }
                sw2.Stop();
                Console.WriteLine("{0} - {1} x {2}%", sw1.Elapsed, sw2.Elapsed, 100 * sw2.ElapsedTicks / sw1.ElapsedTicks, result1, result2);
    
            }
            Console.ReadLine();
        }
        public static bool IsNaN(double d) {
            // Comparisions of a NaN with another number is always false and hence both conditions will be false.
            if (d < 0d || d >= 0d) {
                return false;
            }
            return true;
        }
    }
    

    Which uses the version of Double.IsNaN() that got micro-optimized. Such micro-optimizations are not evil in a framework btw, the great burden of the Microsoft .NET programmers is that they can rarely guess when their code is in the critical path of an application.

    Results on my machine when targeting 32-bit code (Haswell mobile core):

    00:00:00.0027095 - 00:00:00.2427242 x 8957%
    00:00:00.0025248 - 00:00:00.2191291 x 8678%
    00:00:00.0024344 - 00:00:00.2209950 x 9077%
    00:00:00.0024144 - 00:00:00.2321169 x 9613%
    00:00:00.0024126 - 00:00:00.2173313 x 9008%
    00:00:00.0025488 - 00:00:00.2237517 x 8778%
    00:00:00.0026940 - 00:00:00.2231146 x 8281%
    00:00:00.0025052 - 00:00:00.2145660 x 8564%
    00:00:00.0025533 - 00:00:00.2200943 x 8619%
    00:00:00.0024406 - 00:00:00.2135839 x 8751%
    
    0 讨论(0)
  • 2021-01-01 10:14

    Here's a naive benchmark:

    public static void Main()
    {
        int iterations = 500 * 1000 * 1000;
    
        double nan = double.NaN;
        double notNan = 42;
    
        Stopwatch sw = Stopwatch.StartNew();
    
        bool isNan;
        for (int i = 0; i < iterations; i++)
        {
            isNan = IsNaN(nan);     // true
            isNan = IsNaN(notNan);  // false
        }
    
        sw.Stop();
        Console.WriteLine("IsNaN: {0}", sw.ElapsedMilliseconds);
    
        sw = Stopwatch.StartNew();
    
        for (int i = 0; i < iterations; i++)
        {
            isNan = double.IsNaN(nan);     // true
            isNan = double.IsNaN(notNan);  // false
        }
    
        sw.Stop();
        Console.WriteLine("double.IsNaN: {0}", sw.ElapsedMilliseconds);
    
        Console.Read();
    }
    

    Obviously they're wrong:

    IsNaN: 15012

    double.IsNaN: 6243


    EDIT + NOTE: I'm sure the timing will change depending on input values, many other factors etc., but claiming that generally speaking this wrapper is 100x faster than the default implementation seems just wrong.

    0 讨论(0)
提交回复
热议问题