I'm currently inspecting deep objects in the CLR using the Profiler API. I have a specific problem analyzing "this" argument for Iterators/async methods (generated by the compiler, in the form of <name>d__123::MoveNext
While researching this I found that there is indeed a special behavior. First, the C# compiler compiles these generated methods as structs (only in Release mode). ECMA-334 (C# Language Specification, 5th edition: https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-334.pdf) states (12.7.8 This access):
"... If the method or accessor is an iterator or async function, the this variable represents a copy of the struct for which the method or accessor was invoked, ...."
This means that unlike other "this" arguments, in this case the "this" is send by value, not by reference. I indeed see the copy isn't modified outside. I'm trying to understand how, exactly, is the struct actually sent.
I took the liberty to strip down the complicated case, and replicate this with a small struct. Look at the following code:
struct Struct
public static void mainFoo()
Struct st = new Struct();
st.a = "String";
st.p = new Program();
System.Console.WriteLine("foo: " + st.foo1());
System.Console.WriteLine("static foo: " + Struct.foo(st));
int i;
String a;
Program p;
public static int foo(Struct st)
return st.i;
public int foo1()
return i;
is just so we can inspect the JITted code properly. I'm looking at three different things: how mainFoo calls foo/foo1, how foo is compiled and how foo1 is compiled.
The following is the IL code generated (using ildasm):
.method public hidebysig static int32 foo(valuetype nitzan_multi_tester.Struct st) cil managed noinlining
// Code size 7 (0x7)
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldfld int32 nitzan_multi_tester.Struct::i
IL_0006: ret
} // end of method Struct::foo
.method public hidebysig instance int32 foo1() cil managed noinlining
// Code size 7 (0x7)
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldfld int32 nitzan_multi_tester.Struct::i
IL_0006: ret
} // end of method Struct::foo1
.method public hidebysig static void mainFoo() cil managed
// Code size 86 (0x56)
.maxstack 2
.locals init ([0] valuetype nitzan_multi_tester.Struct st)
IL_0000: ldloca.s st
IL_0002: initobj nitzan_multi_tester.Struct
IL_0008: ldloca.s st
IL_000a: ldstr "String"
IL_000f: stfld string nitzan_multi_tester.Struct::a
IL_0014: ldloca.s st
IL_0016: newobj instance void nitzan_multi_tester.Program::.ctor()
IL_001b: stfld class nitzan_multi_tester.Program nitzan_multi_tester.Struct::p
IL_0020: ldstr "foo: "
IL_0025: ldloca.s st
IL_0027: call instance int32 nitzan_multi_tester.Struct::foo1()
IL_002c: box [mscorlib]System.Int32
IL_0031: call string [mscorlib]System.String::Concat(object,
IL_0036: call void [mscorlib]System.Console::WriteLine(string)
IL_003b: ldstr "static foo: "
IL_0040: ldloc.0
IL_0041: call int32 nitzan_multi_tester.Struct::foo(valuetype nitzan_multi_tester.Struct)
IL_0046: box [mscorlib]System.Int32
IL_004b: call string [mscorlib]System.String::Concat(object,
IL_0050: call void [mscorlib]System.Console::WriteLine(string)
IL_0055: ret
} // end of method Struct::mainFoo
The assembly code generated (relevant parts only):
mov eax,dword ptr [rcx+10h]
fooMain (line 18):
mov rcx,offset mscorlib_ni+0x8aaf8 (00007ffc`37d6aaf8) (MT: System.Int32)
call clr+0x2510 (00007ffc`392f2510) (JitHelp: CORINFO_HELP_NEWSFAST)
mov rsi,rax
lea rcx,[rsp+40h]
call 00007ffb`d9db04e0 (nitzan_multi_tester.Struct.foo1(), mdToken: 000000000600000b)
mov dword ptr [rsi+8],eax
mov rdx,rsi
mov rcx,1DBCE383690h
mov rcx,qword ptr [rcx]
call mscorlib_ni+0x635bd0 (00007ffc`38315bd0) (System.String.Concat(System.Object, System.Object), mdToken: 000000000600054f)
mov rcx,rax
call mscorlib_ni+0x56d290 (00007ffc`3824d290) (System.Console.WriteLine(System.String), mdToken: 0000000006000b78)
fooMain (line 19):
mov rcx,offset mscorlib_ni+0x8aaf8 (00007ffc`37d6aaf8) (MT: System.Int32)
call clr+0x2510 (00007ffc`392f2510) (JitHelp: CORINFO_HELP_NEWSFAST)
mov rsi,rax
lea rcx,[rsp+28h]
mov rax,qword ptr [rsp+40h]
mov qword ptr [rcx],rax
mov rax,qword ptr [rsp+48h]
mov qword ptr [rcx+8],rax
mov eax,dword ptr [rsp+50h]
mov dword ptr [rcx+10h],eax
lea rcx,[rsp+28h]
call 00007ffb`d9db04d8 (nitzan_multi_tester.Struct.foo(nitzan_multi_tester.Struct), mdToken: 000000000600000a)
mov dword ptr [rsi+8],eax
mov rdx,rsi
mov rcx,1DBCE383698h
mov rcx,qword ptr [rcx]
call mscorlib_ni+0x635bd0 (00007ffc`38315bd0) (System.String.Concat(System.Object, System.Object), mdToken: 000000000600054f)
mov rcx,rax
call mscorlib_ni+0x56d290 (00007ffc`3824d290) (System.Console.WriteLine(System.String), mdToken: 0000000006000b78)
The first thing we can all see is that both foo and foo1 generates the same IL code (and the same JITted assembly code). This makes sense, since eventually we're just using the first argument. The second thing we see, is that mainFoo calls the two methods differently (ldloc vs ldloca). Since both foo and foo1 expects the same input, I would expect that mainFoo will send the same arguments. This brought up 3 questions
1) What exactly does it mean to load a struct on the stack vs loading a struct's address on that stack? I mean, a struct of size bigger than 8 bytes (64 bit), can't "sit" on the stack.
2) Is the CLR generating a copy of the struct before just to use as "this" (We know this is true, according to C# specification)? Where is this copy stored? fooMain assembly shows that the calling method generates the copy on it's stack.
3) It seems as though both loading a struct by value and address (ldarg/ldloc vs ldarga/ldloca) actually loads an address - for the second set it just creates a copy before. Why? Am I missing something here?
4) Back to Iterators/async - is the foo/foo1 example replicating the difference between "this" argument for iterators&non-iterators structs? Why is this behavior wanted? Creating a copy seems like a waste of work. What's the motivation?
(This example is taken using .Net framework 4.5, but the same behavior is also seen using .Net framework 2 and CoreCLR)
I will quote from the ECMA 335 spec, which defines the CLR on which C# is based, and then we will see how that answers your questions.
I.8.9.7 Value type definition
- When a non-static method (i.e., an instance or virtual method) is called on the value type, its this pointer is a managed reference to the instance, where as when the method is called on the associated boxed type, the this pointer is an object reference.
Instance methods on value types receive a this pointer that is a managed pointer to the unboxed type whereas virtual methods (including those on interfaces implemented by the value type) receive an instance of the boxed type.
This tells us that an instance method of struct, such as foo1()
above, have a this pointer which is represented as a managed reference, i.e. a GC pointer to the actual struct, you know this in C# as a ref.
In the case of boxed structs that are known to be of that type, it is possible to call a method without unboxing, the CLR will pass the ref pointer automatically. See II.13.3.
Now, what happens if we need to access the field from a struct stored in a local, a ref or loaded directly on the stack?
III.4.10 ldfld – load field of an object
Stack Transition
... obj => value ...
The ldfld instruction pushes onto the stack the value of a field of obj. obj shall be an object (type O), a managed pointer (type &), an unmanaged pointer (type native int), or an instance of a value type.
So no matter where the struct is, we can use ldfld to get the value. The entire value on the stack is popped, and the value loaded. But you must understand that the object on the logical (theoretical) stack is different in each case.
In foo()
, you pass the struct by value on the stack (ldloc.0
) and the method does the same (ldarg.0
In foo1()
, the struct is passed as this
by ref (ldloca.s
), and it's loaded by-ref (here ldarg.0
represents the ref).
The following will be relevant in a moment.
I.8.2.1 Managed pointers and related types
snip ...they cannot be used for field signatures...
snip Rationale: For performance reasons items on the GC heap may not contain references to the interior of other GC objects, this motivates the restrictions on fields...
Now to answer your questions:
- We can load a struct direct to the stack. This will take up however many bytes the struct is.
- Your example is not a case of iterators or async. The c# spec at ECMA-334 12.7.8 says this is a ref, so this is actually a mutable pointer. You can prove this by mutating the struct in
. - Your example of a struct is a bit of an exception when it comes to the JITted assembler in
. It seems the JIT will optimize for a struct being bigger than 8 bytes and pass it by-ref where possible i.e. without changing the semantics. - In an actual async or iterator function, the parameters are transformed into fields of a compiler-generated struct, which works as a state machine. The CLR will not permit a ref to be stored in a field, so by-value semantics must be followed.