Why is .NET faster than C++ in this case?

后端 未结 13 1474
长发绾君心
长发绾君心 2021-02-02 00:59

Make sure you run outside of the IDE. That is key.

-edit- I LOVE SLaks comment. \"The amount of misinformation in these answers is staggering.\" :D

相关标签:
13条回答
  • 2021-02-02 01:18

    One important thing to remember when comparing languages is that if you do a simple line-by-line translation, you're not comparing apples to apples.

    What makes sense in one language may have horrible side effects in another. To really compare the performance characteristics you need a C# version and a C++, and the code for those versions may be very different. For example, in C# I wouldn't even use the same function signature. I'd go with something more like this:

    IEnumerable<int> Fibonacci()
    {
       int n1 = 0;
       int n2 = 1;
    
       yield return 1;
       while (true)
       {
          int n = n1 + n2;
          n1 = n2;
          n2 = n;
          yield return n;
       }
    }
    

    and then wrap that like this:

    public static int fib(int n)
    {
        return Fibonacci().Skip(n).First();
    }
    

    That will do much better, because it works from the bottom up to take advantage of the calculations in the last term to help build the next one, rather than two separate sets of recursive calls.

    And if you really want screaming performance in C++ you can use meta-programming to make the compiler pre-compute your results like this:

    template<int N> struct fibonacci
    {
        static const int value = fibonacci<N - 1>::value + fibonacci<N - 2>::value;
    };
    
    template<> struct fibonacci<1>
    {
        static const int value = 1;
    };
    
    template<> struct fibonacci<0>
    {
        static const int value = 0;
    };
    
    0 讨论(0)
  • 2021-02-02 01:25

    EDIT: TL/DR version: CLR JIT will inline one level of recursion, MSVC 8 SP1 will not without #pragma inline_recursion(on). And you should run the C# version outside of a debugger to get the fully optimized JIT.

    I got similar results to acidzombie24 with C# vs. C++ using VS 2008 SP1 on a Core2 Duo laptop running Vista plugged in with "high performance" power settings (~1600 ms vs. ~3800 ms). It's kind of tricky to see the optimized JIT'd C# code, but for x86 it boils down to this:

    00000000 55               push        ebp  
    00000001 8B EC            mov         ebp,esp 
    00000003 57               push        edi  
    00000004 56               push        esi  
    00000005 53               push        ebx  
    00000006 8B F1            mov         esi,ecx 
    00000008 83 FE 02         cmp         esi,2 
    0000000b 7D 07            jge         00000014 
    0000000d 8B C6            mov         eax,esi 
    0000000f 5B               pop         ebx  
    00000010 5E               pop         esi  
    00000011 5F               pop         edi  
    00000012 5D               pop         ebp  
    00000013 C3               ret              
                return fib(n - 1) + fib(n - 2);
    00000014 8D 7E FF         lea         edi,[esi-1] 
    00000017 83 FF 02         cmp         edi,2 
    0000001a 7D 04            jge         00000020 
    0000001c 8B DF            mov         ebx,edi 
    0000001e EB 19            jmp         00000039 
    00000020 8D 4F FF         lea         ecx,[edi-1] 
    00000023 FF 15 F8 2F 12 00 call        dword ptr ds:[00122FF8h] 
    00000029 8B D8            mov         ebx,eax 
    0000002b 4F               dec         edi  
    0000002c 4F               dec         edi  
    0000002d 8B CF            mov         ecx,edi 
    0000002f FF 15 F8 2F 12 00 call        dword ptr ds:[00122FF8h] 
    00000035 03 C3            add         eax,ebx 
    00000037 8B D8            mov         ebx,eax 
    00000039 4E               dec         esi  
    0000003a 4E               dec         esi  
    0000003b 83 FE 02         cmp         esi,2 
    0000003e 7D 04            jge         00000044 
    00000040 8B D6            mov         edx,esi 
    00000042 EB 19            jmp         0000005D 
    00000044 8D 4E FF         lea         ecx,[esi-1] 
    00000047 FF 15 F8 2F 12 00 call        dword ptr ds:[00122FF8h] 
    0000004d 8B F8            mov         edi,eax 
    0000004f 4E               dec         esi  
    00000050 4E               dec         esi  
    00000051 8B CE            mov         ecx,esi 
    00000053 FF 15 F8 2F 12 00 call        dword ptr ds:[00122FF8h] 
    00000059 03 C7            add         eax,edi 
    0000005b 8B D0            mov         edx,eax 
    0000005d 03 DA            add         ebx,edx 
    0000005f 8B C3            mov         eax,ebx 
    00000061 5B               pop         ebx  
    00000062 5E               pop         esi  
    00000063 5F               pop         edi  
    00000064 5D               pop         ebp  
    00000065 C3               ret  
    

    In contrast to the C++ generated code (/Ox /Ob2 /Oi /Ot /Oy /GL /Gr):

    int fib(int n)
    { 
    00B31000 56               push        esi  
    00B31001 8B F1            mov         esi,ecx 
        if (n < 2) return n; 
    00B31003 83 FE 02         cmp         esi,2 
    00B31006 7D 04            jge         fib+0Ch (0B3100Ch) 
    00B31008 8B C6            mov         eax,esi 
    00B3100A 5E               pop         esi  
    00B3100B C3               ret              
    00B3100C 57               push        edi  
        return fib(n - 1) + fib(n - 2); 
    00B3100D 8D 4E FE         lea         ecx,[esi-2] 
    00B31010 E8 EB FF FF FF   call        fib (0B31000h) 
    00B31015 8D 4E FF         lea         ecx,[esi-1] 
    00B31018 8B F8            mov         edi,eax 
    00B3101A E8 E1 FF FF FF   call        fib (0B31000h) 
    00B3101F 03 C7            add         eax,edi 
    00B31021 5F               pop         edi  
    00B31022 5E               pop         esi  
    } 
    00B31023 C3               ret              
    

    The C# version basically inlines fib(n-1) and fib(n-2). For a function that is so call heavy, reducing the number of function calls is the key to speed. Replacing fib with the following:

    int fib(int n);
    
    int fib2(int n) 
    { 
        if (n < 2) return n; 
        return fib(n - 1) + fib(n - 2); 
    } 
    
    int fib(int n)
    { 
        if (n < 2) return n; 
        return fib2(n - 1) + fib2(n - 2); 
    } 
    

    Gets it down to ~1900 ms. Incidentally, if I use #pragma inline_recursion(on) I get similar results with the original fib. Unrolling it one more level:

    int fib(int n);
    
    int fib3(int n) 
    { 
        if (n < 2) return n; 
        return fib(n - 1) + fib(n - 2); 
    } 
    
    int fib2(int n) 
    { 
        if (n < 2) return n; 
        return fib3(n - 1) + fib3(n - 2); 
    } 
    
    int fib(int n)
    { 
        if (n < 2) return n; 
        return fib2(n - 1) + fib2(n - 2); 
    } 
    

    Gets it down to ~1380 ms. Beyond that it tapers off.

    So it appears that the CLR JIT for my machine will inline recursive calls one level, whereas the C++ compiler will not do that by default.

    If only all performance critical code were like fib!

    0 讨论(0)
  • 2021-02-02 01:27

    I think the problem is your timing code in C++.

    From the MS docs for __rdtsc:

    Generates the rdtsc instruction, which returns the processor time stamp. The processor time stamp records the number of clock cycles since the last reset.

    Perhaps try GetTickCount().

    0 讨论(0)
  • 2021-02-02 01:27

    Speculation 1

    Garbage collection procedure might play a role.

    In the C++ version all memory management would occur inline while the program is running, and that would count into the final time.

    In .NET the Garbage Collector (GC) of the Common Language Runtime (CLR) is a separate process on a different thread and often cleans up your program after it's completed. Therefore your program will finish, the times will print out before memory is freed. Especially for small programs which usually won't be cleaned up at all until completion.

    It all depends on details of the Garbage Collection implementation (and if it optimizes for the stack in the same way as the heap) but I assume this plays a partial role in the speed gains. If the C++ version was also optimized to not deallocate/clean up memory until after it finished (or push that step until after the program completed) then I'm sure you would see C++ speed gains.

    To Test GC: To see the "delayed" .NET GC behaviour in action, put a breakpoint in some of your object's destructor/finalizer methods. The debugger will come alive and hit those breakpoints after the program is completed (yes, after Main is completed).

    Speculation 2

    Otherwise, the C# source code is compiled by the programmer down to IL code (Microsoft byte code instructions) and at runtime those are in turn compiled by the CLR's Just-In-Time compiler into an processor-specific instruction set (as with classic compiled programs) so there's really no reason a .NET program should be slower once it gets going and has run the first time.

    0 讨论(0)
  • 2021-02-02 01:28

    I know that the .NET compiler has a Intel optimization.

    0 讨论(0)
  • 2021-02-02 01:30

    Don't understand the answer with garbage collection or console buffering.

    It could be that your timer mechanism in C++ is inherently flawed.

    According to http://en.wikipedia.org/wiki/Rdtsc, it is possible that you get wrong benchmark results.

    Quoted:

    While this makes time keeping more consistent, it can skew benchmarks, where a certain amount of spin-up time is spent at a lower clock rate before the OS switches the processor to the higher rate. This has the effect of making things seem like they require more processor cycles than they normally would.

    0 讨论(0)
提交回复
热议问题