问题
I'm currently doing some last-measure optimizations, mostly for fun and learning, and discovered something that left me with a couple of questions.
First, the questions:
- When I construct a method in-memory through the use of DynamicMethod, and use the debugger, is there any way for me to step into the generated assembly code, when vieweing the code in the disassembler view? The debugger seems to just step over the whole method for me
- Or, if that's not possible, is it possible for me to somehow save the generated IL code to disk as an assembly, so that I can inspect it with Reflector?
- Why does the
Expression<...>
version of my simple addition method (Int32+Int32 => Int32) run faster than a minimal DynamicMethod version?
Here's a short and complete program that demonstrates. On my system, the output is:
DynamicMethod: 887 ms
Lambda: 1878 ms
Method: 1969 ms
Expression: 681 ms
I expected the lambda and method calls to have higher values, but the DynamicMethod version is consistently about 30-50% slower (variations probably due to Windows and other programs). Anyone know the reason?
Here's the program:
using System;
using System.Linq.Expressions;
using System.Reflection.Emit;
using System.Diagnostics;
namespace Sandbox
{
public class Program
{
public static void Main(String[] args)
{
DynamicMethod method = new DynamicMethod("TestMethod",
typeof(Int32), new Type[] { typeof(Int32), typeof(Int32) });
var il = method.GetILGenerator();
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldarg_1);
il.Emit(OpCodes.Add);
il.Emit(OpCodes.Ret);
Func<Int32, Int32, Int32> f1 =
(Func<Int32, Int32, Int32>)method.CreateDelegate(
typeof(Func<Int32, Int32, Int32>));
Func<Int32, Int32, Int32> f2 = (Int32 a, Int32 b) => a + b;
Func<Int32, Int32, Int32> f3 = Sum;
Expression<Func<Int32, Int32, Int32>> f4x = (a, b) => a + b;
Func<Int32, Int32, Int32> f4 = f4x.Compile();
for (Int32 pass = 1; pass <= 2; pass++)
{
// Pass 1 just runs all the code without writing out anything
// to avoid JIT overhead influencing the results
Time(f1, "DynamicMethod", pass);
Time(f2, "Lambda", pass);
Time(f3, "Method", pass);
Time(f4, "Expression", pass);
}
}
private static void Time(Func<Int32, Int32, Int32> fn,
String name, Int32 pass)
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (Int32 index = 0; index <= 100000000; index++)
{
Int32 result = fn(index, 1);
}
sw.Stop();
if (pass == 2)
Debug.WriteLine(name + ": " + sw.ElapsedMilliseconds + " ms");
}
private static Int32 Sum(Int32 a, Int32 b)
{
return a + b;
}
}
}
回答1:
The method created via DynamicMethod
goes through two thunks, while the method created via Expression<>
doesn't go through any.
Here's how it works. Here's the calling sequence for invoking fn(0, 1)
in the Time
method (I hard-coded the arguments to 0 and 1 for ease of debugging):
00cc032c 6a01 push 1 // 1 argument
00cc032e 8bcf mov ecx,edi
00cc0330 33d2 xor edx,edx // 0 argument
00cc0332 8b410c mov eax,dword ptr [ecx+0Ch]
00cc0335 8b4904 mov ecx,dword ptr [ecx+4]
00cc0338 ffd0 call eax // 1 arg on stack, two in edx, ecx
For the first invocation I investigated, DynamicMethod
, the call eax
line comes up like so:
00cc0338 ffd0 call eax {003c2084}
0:000> !u 003c2084
Unmanaged code
003c2084 51 push ecx
003c2085 8bca mov ecx,edx
003c2087 8b542408 mov edx,dword ptr [esp+8]
003c208b 8b442404 mov eax,dword ptr [esp+4]
003c208f 89442408 mov dword ptr [esp+8],eax
003c2093 58 pop eax
003c2094 83c404 add esp,4
003c2097 83c010 add eax,10h
003c209a ff20 jmp dword ptr [eax]
This appears to be doing some stack swizzling to rearrange arguments. I speculate that it's owing to the difference between delegates that use the implicit 'this' argument and those that don't.
That jump at the end resolves like so:
003c209a ff20 jmp dword ptr [eax] ds:0023:012f7edc=0098c098
0098c098 e963403500 jmp 00ce0100
The remainder of the code at 0098c098 looks like a JIT thunk, whose start got rewritten with a jmp
after the JIT. It's only after this jump that we get to real code:
0:000> !u eip
Normal JIT generated code
DynamicClass.TestMethod(Int32, Int32)
Begin 00ce0100, size 5
>>> 00ce0100 03ca add ecx,edx
00ce0102 8bc1 mov eax,ecx
00ce0104 c3 ret
The invocation sequence for the method created via Expression<>
is different - it's missing the stack swizzling code. Here it is, from the first jump via eax
:
00cc0338 ffd0 call eax {00ce00a8}
0:000> !u eip
Normal JIT generated code
DynamicClass.lambda_method(System.Runtime.CompilerServices.ExecutionScope, Int32, Int32)
Begin 00ce00a8, size b
>>> 00ce00a8 8b442404 mov eax,dword ptr [esp+4]
00ce00ac 03d0 add edx,eax
00ce00ae 8bc2 mov eax,edx
00ce00b0 c20400 ret 4
Now, how did things get like this?
- Stack swizzling wasn't necessary (the implicit first argument from the delegate is actually used, i.e. not like a delegate bound to a static method)
- The JIT must have been forced by LINQ compilation logic so that the delegate held the real destination address rather than a fake one.
I don't know how the LINQ forced the JIT, but I do know how to force a JIT myself - by calling the function at least once. UPDATE: I found another way to force a JIT: use the restrictedSkipVisibility
argumetn to the constructor and pass true
. So, here's modified code that eliminates stack swizzling by using the implicit 'this' parameter, and uses the alternate constructor to pre-compile so that the bound address is the real address, rather than the thunk:
using System;
using System.Linq.Expressions;
using System.Reflection.Emit;
using System.Diagnostics;
namespace Sandbox
{
public class Program
{
public static void Main(String[] args)
{
DynamicMethod method = new DynamicMethod("TestMethod",
typeof(Int32), new Type[] { typeof(object), typeof(Int32),
typeof(Int32) }, true);
var il = method.GetILGenerator();
il.Emit(OpCodes.Ldarg_1);
il.Emit(OpCodes.Ldarg_2);
il.Emit(OpCodes.Add);
il.Emit(OpCodes.Ret);
Func<Int32, Int32, Int32> f1 =
(Func<Int32, Int32, Int32>)method.CreateDelegate(
typeof(Func<Int32, Int32, Int32>), null);
Func<Int32, Int32, Int32> f2 = (Int32 a, Int32 b) => a + b;
Func<Int32, Int32, Int32> f3 = Sum;
Expression<Func<Int32, Int32, Int32>> f4x = (a, b) => a + b;
Func<Int32, Int32, Int32> f4 = f4x.Compile();
for (Int32 pass = 1; pass <= 2; pass++)
{
// Pass 1 just runs all the code without writing out anything
// to avoid JIT overhead influencing the results
Time(f1, "DynamicMethod", pass);
Time(f2, "Lambda", pass);
Time(f3, "Method", pass);
Time(f4, "Expression", pass);
}
}
private static void Time(Func<Int32, Int32, Int32> fn,
String name, Int32 pass)
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (Int32 index = 0; index <= 100000000; index++)
{
Int32 result = fn(index, 1);
}
sw.Stop();
if (pass == 2)
Console.WriteLine(name + ": " + sw.ElapsedMilliseconds + " ms");
}
private static Int32 Sum(Int32 a, Int32 b)
{
return a + b;
}
}
}
Here's the runtimes on my system:
DynamicMethod: 312 ms
Lambda: 417 ms
Method: 417 ms
Expression: 312 ms
UPDATED TO ADD:
I tried running this code on my new system, which is an Core i7 920 running Windows 7 x64 with .NET 4 beta 2 installed (mscoree.dll ver. 4.0.30902), and the results are, well, variable.
csc 3.5, /platform:x86, runtime v2.0.50727 (via .config)
Run #1
DynamicMethod: 214 ms
Lambda: 571 ms
Method: 570 ms
Expression: 249 ms
Run #2
DynamicMethod: 463 ms
Lambda: 392 ms
Method: 392 ms
Expression: 463 ms
Run #3
DynamicMethod: 463 ms
Lambda: 570 ms
Method: 570 ms
Expression: 463 ms
Perhaps this is Intel SpeedStep affecting results, or possibly Turbo Boost. In any case, it's very annoying.
csc 3.5, /platform:x64, runtime v2.0.50727 (via .config)
DynamicMethod: 428 ms
Lambda: 392 ms
Method: 392 ms
Expression: 428 ms
csc 3.5, /platform:x64, runtime v4
DynamicMethod: 428 ms
Lambda: 356 ms
Method: 356 ms
Expression: 428 ms
csc 4, /platform:x64, runtime v4
DynamicMethod: 428 ms
Lambda: 356 ms
Method: 356 ms
Expression: 428 ms
csc 4, /platform:x86, runtime v4
DynamicMethod: 463 ms
Lambda: 570 ms
Method: 570 ms
Expression: 463 ms
csc 3.5, /platform:x86, runtime v4
DynamicMethod: 214 ms
Lambda: 570 ms
Method: 571 ms
Expression: 249 ms
Many of these results will be accidents of timing, whatever it is that is causing the random speedups in the C# 3.5 / runtime v2.0 scenario. I'll have to reboot to see if SpeedStep or Turbo Boost is responsible for these effects.
来源:https://stackoverflow.com/questions/1296683/curiosity-why-does-expression-when-compiled-run-faster-than-a-minimal-dyna