While answering a question on SO yesterday, I noticed that if an object is initialized using an Object Initializer, the compiler creates an extra local variable.
Con
For the Why: could be that it's done to ensure that no "known" reference to a not (fully) initialized object (from the language's point of view) exists? Something like (pseudo-)constructor semantics for the object initializer? But that's just an idea.. and I can't imagine a way to use the reference and access the not initialized object besides in a multi-threaded environment.
EDIT: too slow..
Thread-safety and atomicity.
First, consider this line of code:
MyObject foo = new MyObject { Name = "foo", Value = 42 };
Anybody reading that statement might reasonably assume that the construction of the foo
object will be atomic. Before the assignment the object doesn't exist at all. Once the assignment has completed the object exists and is in the expected state.
Now consider two possible ways of translating that code:
// #1
MyObject foo = new MyObject();
foo.Name = "foo";
foo.Value = 42;
// #2
MyObject temp = new MyObject(); // temp will be a compiler-generated name
temp.Name = "foo";
temp.Value = 42;
MyObject foo = temp;
In the first case the foo
object is instantiated on the first line, but it won't be in the expected state until the final line has finished executing. What happens if another thread tries to access the object before the last line has executed? The object will be in a semi-initialised state.
In the second case the foo
object doesn't exist until the final line when it is assigned from temp
. Since reference assignment is an atomic operation this gives exactly the same semantics as the original, single-line assignment statement. ie, The foo
object never exists in a semi-initialised state.
Luke's answer is both correct and excellent, so good on you. It is not, however, complete. There are even more good reasons why we do this.
The specification is extremely clear that this is the correct codegen; the specification says that an object initializer creates a temporary, invisible local which stores the result of the expression. But why did we spec it that way? That is, why is it that
Foo foo = new Foo() { Bar = bar };
means
Foo foo;
Foo temp = new Foo();
temp.Bar = bar;
foo = temp;
and not the more straightforward
Foo foo = new Foo();
foo.Bar = bar;
Well, as a purely practical matter, it's always easier to specify the behaviour of an expression as based on its contents, not its context. For this specific case though, suppose we specified that this was the desired codegen for assignment to a local or field. In that case, foo would be definitely assigned after the (), and therefore could be used in the initializer. Do you REALLY want
Foo foo = new Foo() { Bar = M(foo) };
to be legal? I hope not. foo is not definitely assigned until after the initialization is done.
Or, consider properties.
Frob().MyFoo = new Foo() { Bar = bar };
This has to be
Foo temp = new Foo();
temp.Bar = bar;
Frob().MyFoo = temp;
and not
Frob().MyFoo = new Foo();
Frob().MyFoo.Bar = bar;
because we don't want Frob() called twice and we don't want property MyFoo accessed twice, we want them each accessed once.
Now, in your particular case, we could write an optimizing pass that detects that the extra local is unnecessary and optimize it away. But we have other priorities, and the jitter probably does a good job of optimizing locals.
Good question. I've been meaning to blog this one for a while.