Curiosity: Why does Expression<…> when compiled run faster than a minimal DynamicMethod?

后端 未结 1 615
感动是毒
感动是毒 2021-01-29 23:03

I\'m currently doing some last-measure optimizations, mostly for fun and learning, and discovered something that left me with a couple of questions.

First, the questions

1条回答
  •  执笔经年
    2021-01-29 23:46

    The method created via DynamicMethod goes through two thunks, while the method created via Expression<> doesn't go through any.

    Here's how it works. Here's the calling sequence for invoking fn(0, 1) in the Time method (I hard-coded the arguments to 0 and 1 for ease of debugging):

    00cc032c 6a01            push    1           // 1 argument
    00cc032e 8bcf            mov     ecx,edi
    00cc0330 33d2            xor     edx,edx     // 0 argument
    00cc0332 8b410c          mov     eax,dword ptr [ecx+0Ch]
    00cc0335 8b4904          mov     ecx,dword ptr [ecx+4]
    00cc0338 ffd0            call    eax // 1 arg on stack, two in edx, ecx
    

    For the first invocation I investigated, DynamicMethod, the call eax line comes up like so:

    00cc0338 ffd0            call    eax {003c2084}
    0:000> !u 003c2084
    Unmanaged code
    003c2084 51              push    ecx
    003c2085 8bca            mov     ecx,edx
    003c2087 8b542408        mov     edx,dword ptr [esp+8]
    003c208b 8b442404        mov     eax,dword ptr [esp+4]
    003c208f 89442408        mov     dword ptr [esp+8],eax
    003c2093 58              pop     eax
    003c2094 83c404          add     esp,4
    003c2097 83c010          add     eax,10h
    003c209a ff20            jmp     dword ptr [eax]
    

    This appears to be doing some stack swizzling to rearrange arguments. I speculate that it's owing to the difference between delegates that use the implicit 'this' argument and those that don't.

    That jump at the end resolves like so:

    003c209a ff20            jmp     dword ptr [eax]      ds:0023:012f7edc=0098c098
    0098c098 e963403500      jmp     00ce0100
    

    The remainder of the code at 0098c098 looks like a JIT thunk, whose start got rewritten with a jmp after the JIT. It's only after this jump that we get to real code:

    0:000> !u eip
    Normal JIT generated code
    DynamicClass.TestMethod(Int32, Int32)
    Begin 00ce0100, size 5
    >>> 00ce0100 03ca            add     ecx,edx
    00ce0102 8bc1            mov     eax,ecx
    00ce0104 c3              ret
    

    The invocation sequence for the method created via Expression<> is different - it's missing the stack swizzling code. Here it is, from the first jump via eax:

    00cc0338 ffd0            call    eax {00ce00a8}
    
    0:000> !u eip
    Normal JIT generated code
    DynamicClass.lambda_method(System.Runtime.CompilerServices.ExecutionScope, Int32, Int32)
    Begin 00ce00a8, size b
    >>> 00ce00a8 8b442404        mov     eax,dword ptr [esp+4]
    00ce00ac 03d0            add     edx,eax
    00ce00ae 8bc2            mov     eax,edx
    00ce00b0 c20400          ret     4
    

    Now, how did things get like this?

    1. Stack swizzling wasn't necessary (the implicit first argument from the delegate is actually used, i.e. not like a delegate bound to a static method)
    2. The JIT must have been forced by LINQ compilation logic so that the delegate held the real destination address rather than a fake one.

    I don't know how the LINQ forced the JIT, but I do know how to force a JIT myself - by calling the function at least once. UPDATE: I found another way to force a JIT: use the restrictedSkipVisibility argumetn to the constructor and pass true. So, here's modified code that eliminates stack swizzling by using the implicit 'this' parameter, and uses the alternate constructor to pre-compile so that the bound address is the real address, rather than the thunk:

    using System;
    using System.Linq.Expressions;
    using System.Reflection.Emit;
    using System.Diagnostics;
    
    namespace Sandbox
    {
        public class Program
        {
            public static void Main(String[] args)
            {
                DynamicMethod method = new DynamicMethod("TestMethod",
                    typeof(Int32), new Type[] { typeof(object), typeof(Int32),
                    typeof(Int32) }, true);
                var il = method.GetILGenerator();
    
                il.Emit(OpCodes.Ldarg_1);
                il.Emit(OpCodes.Ldarg_2);
                il.Emit(OpCodes.Add);
                il.Emit(OpCodes.Ret);
    
                Func f1 =
                    (Func)method.CreateDelegate(
                        typeof(Func), null);
                Func f2 = (Int32 a, Int32 b) => a + b;
                Func f3 = Sum;
                Expression> f4x = (a, b) => a + b;
                Func f4 = f4x.Compile();
                for (Int32 pass = 1; pass <= 2; pass++)
                {
                    // Pass 1 just runs all the code without writing out anything
                    // to avoid JIT overhead influencing the results
                    Time(f1, "DynamicMethod", pass);
                    Time(f2, "Lambda", pass);
                    Time(f3, "Method", pass);
                    Time(f4, "Expression", pass);
                }
            }
    
            private static void Time(Func fn,
                String name, Int32 pass)
            {
                Stopwatch sw = new Stopwatch();
                sw.Start();
                for (Int32 index = 0; index <= 100000000; index++)
                {
                    Int32 result = fn(index, 1);
                }
                sw.Stop();
                if (pass == 2)
                    Console.WriteLine(name + ": " + sw.ElapsedMilliseconds + " ms");
            }
    
            private static Int32 Sum(Int32 a, Int32 b)
            {
                return a + b;
            }
        }
    }
    

    Here's the runtimes on my system:

    DynamicMethod: 312 ms
    Lambda: 417 ms
    Method: 417 ms
    Expression: 312 ms
    

    UPDATED TO ADD:

    I tried running this code on my new system, which is an Core i7 920 running Windows 7 x64 with .NET 4 beta 2 installed (mscoree.dll ver. 4.0.30902), and the results are, well, variable.

    csc 3.5, /platform:x86, runtime v2.0.50727 (via .config)
    
    Run #1
    DynamicMethod: 214 ms
    Lambda: 571 ms
    Method: 570 ms
    Expression: 249 ms
    
    Run #2
    DynamicMethod: 463 ms
    Lambda: 392 ms
    Method: 392 ms
    Expression: 463 ms
    
    Run #3
    DynamicMethod: 463 ms
    Lambda: 570 ms
    Method: 570 ms
    Expression: 463 ms
    

    Perhaps this is Intel SpeedStep affecting results, or possibly Turbo Boost. In any case, it's very annoying.

    csc 3.5, /platform:x64, runtime v2.0.50727 (via .config)
    DynamicMethod: 428 ms
    Lambda: 392 ms
    Method: 392 ms
    Expression: 428 ms
    
    csc 3.5, /platform:x64, runtime v4
    DynamicMethod: 428 ms
    Lambda: 356 ms
    Method: 356 ms
    Expression: 428 ms
    
    csc 4, /platform:x64, runtime v4
    DynamicMethod: 428 ms
    Lambda: 356 ms
    Method: 356 ms
    Expression: 428 ms
    
    csc 4, /platform:x86, runtime v4
    DynamicMethod: 463 ms
    Lambda: 570 ms
    Method: 570 ms
    Expression: 463 ms
    
    csc 3.5, /platform:x86, runtime v4
    DynamicMethod: 214 ms
    Lambda: 570 ms
    Method: 571 ms
    Expression: 249 ms
    

    Many of these results will be accidents of timing, whatever it is that is causing the random speedups in the C# 3.5 / runtime v2.0 scenario. I'll have to reboot to see if SpeedStep or Turbo Boost is responsible for these effects.

    0 讨论(0)
提交回复
热议问题