C# vs. C++ performance — why doesn't .NET perform the most basic optimizations (like dead code elimination)?

前端 未结 1 1834
灰色年华
灰色年华 2021-02-02 14:01

I\'m seriously doubting if the C# or .NET JIT compilers perform any useful optimizations, much less if they\'re actually competitive with the most basic ones in C++ com

相关标签:
1条回答
  • 2021-02-02 14:46

    The .NET JIT is a poor compiler, this is true. Fortunately, a new JIT (RyuJIT) and an NGEN that seems to be based on the VC compiler are in the works (I believe this is what the Windows Phone cloud compiler uses).

    Although it is a very simple compiler it does inline small functions and remove side-effect free loops to a certain extent. It is not good at all of this but it happens.

    Before we go into the detailed findings, note that the x86 and x64 JIT's are different codebases, perform differently and have different bugs.


    Test 1:

    You ran the program in Release mode in 32 bit mode. I can reproduce your findings on .NET 4.5 with 32 bit mode. Yes, this is embarrassing.

    In 64 bit mode though, Rem in the first example is inlined and the innermost of the two nested loops is removed:

    enter image description here

    I have marked the three loop instructions. The outer loop is still there. I don't think that ever matters in practice because you rarely have two nested dead loops.

    Note, that the loop was unrolled 4 times, then the unrolled iterations were collapsed into a single iteration (unrolling produced i += 1; i+= 1; i+= 1; i+= 1; and that was collapsed to i += 4;). Granted, the entire loop could be optimized away, but the JIT did perform the things that matter most in practice: unrolling loops and simplifying code.

    I also added the following to Main to make it easier to debug:

        Console.WriteLine(IntPtr.Size); //verify bitness
        Debugger.Break(); //attach debugger
    


    Test 2:

    I cannot fully reproduce your findings in either 32 bit or 64 bit mode. In all cases Test2 is inlined into Test1 making it a very simple function:

    enter image description here

    Main calls Test1 in a loop because Test1 was too big to inline (because the non-simplified size counts because methods are JIT'ed in isolation).

    When you have only a single Test2 call in Test1 then both functions are small enough to be inlined. This enables the JIT for Main to discover that nothing is being done at all in that code.


    Final answer: I hope I could shed some light on what is going on. In the process I did discover some important optimizations. The JIT is just not very thorough and complete. If the same optimizations were just performed in a second idential pass, a lot more could be simplified here. But most programs only need one pass through all the simplifiers. I agree with the choice the JIT team made here.

    So why is the JIT so bad? One part is that it must be fast because JITing is latency-sensitive. Another part is that it is just a primitive JIT and needs more investment.

    0 讨论(0)
提交回复
热议问题