What is an anonymous method? Is it really anonymous? Does it have a name? All good questions, so let's start with them and work our way up to lambda expressions as we go along.
When you do this:
public void TestSomething()
{
Test(delegate { Debug.WriteLine("Test"); });
}
What actually happens?
The compiler first decides to take the "body" of the method, which is this:
Debug.WriteLine("Test");
and separate out that into a method.
Two questions the compiler now has to answer:
- Where should I put the method?
- What should the signature of the method look like?
The second question is easily answered. The delegate {
part answers that. The method takes no parameters (nothing between delegate
and {
), and since we don't care about its name (hence the "anonymous" part), we can declare the method as such:
public void SomeOddMethod()
{
Debug.WriteLine("Test");
}
But why did it do all this?
Let's look at what a delegate, such as Action
really is.
A delegate is, if we for a moment disregard the fact that delegates in .NET are actually linked list of multiple single "delegates", a reference (pointer) to two things:
- An object instance
- A method on that object instance
So, with that knowledge, the first piece of code could actually be rewritten as this:
public void TestSomething()
{
Test(new Action(this.SomeOddMethod));
}
private void SomeOddMethod()
{
Debug.WriteLine("Test");
}
Now, the problem with this is that the compiler has no way of knowing what Test
actually does with the delegate it is given, and since one half of the delegate is a reference to the instance on which the method is to be called, this
in the above example, we don't know how much data will be referenced.
For instance, consider if the above code was part of a really huge object, but an object that only live temporarily. Also consider that Test
would store that delegate somewhere it would live for a long time. That "long time" would tie itself up to the life of that huge object as well, keeping a reference to that for a long time as well, probably not good.
So the compiler does more than just create a method, it also creates a class to hold it. This answers the first question, where should I put it?.
The code above can thus be rewritten as follows:
public void TestSomething()
{
var temp = new SomeClass;
Test(new Action(temp.SomeOddMethod));
}
private class SomeClass
{
private void SomeOddMethod()
{
Debug.WriteLine("Test");
}
}
That is, for this example, what an anonymous method is really all about.
Things get a bit more hairy if you start using local variables, consider this example:
public void Test()
{
int x = 10;
Test(delegate { Debug.WriteLine("x=" + x); });
}
This is what happens under the hood, or at least something very close to it:
public void TestSomething()
{
var temp = new SomeClass;
temp.x = 10;
Test(new Action(temp.SomeOddMethod));
}
private class SomeClass
{
public int x;
private void SomeOddMethod()
{
Debug.WriteLine("x=" + x);
}
}
The compiler creates a class, lifts all the variables that the method requires into that class, and rewrites all access to the local variables to be access to fields on the anonymous type.
The name of the class, and the method, are a bit odd, let's ask LINQPad what it would be:
void Main()
{
int x = 10;
Test(delegate { Debug.WriteLine("x=" + x); });
}
public void Test(Action action)
{
action();
}
If I ask LINQPad to output the IL (Intermediate Language) of this program, I get this:
// var temp = new UserQuery+<>c__DisplayClass1();
IL_0000: newobj UserQuery+<>c__DisplayClass1..ctor
IL_0005: stloc.0 // CS$<>8__locals2
IL_0006: ldloc.0 // CS$<>8__locals2
// temp.x = 10;
IL_0007: ldc.i4.s 0A
IL_0009: stfld UserQuery+<>c__DisplayClass1.x
// var action = new Action(temp.b__0);
IL_000E: ldarg.0
IL_000F: ldloc.0 // CS$<>8__locals2
IL_0010: ldftn UserQuery+<>c__DisplayClass1.b__0
IL_0016: newobj System.Action..ctor
// Test(action);
IL_001B: call UserQuery.Test
Test:
IL_0000: ldarg.1
IL_0001: callvirt System.Action.Invoke
IL_0006: ret
<>c__DisplayClass1.b__0:
IL_0000: ldstr "x="
IL_0005: ldarg.0
IL_0006: ldfld UserQuery+<>c__DisplayClass1.x
IL_000B: box System.Int32
IL_0010: call System.String.Concat
IL_0015: call System.Diagnostics.Debug.WriteLine
IL_001A: ret
<>c__DisplayClass1..ctor:
IL_0000: ldarg.0
IL_0001: call System.Object..ctor
IL_0006: ret
Here you can see that the name of the class is UserQuery+<>c__DisplayClass1
, and the name of the method is b__0
. I edited in the C# code that produced this code, LINQPad doesn't produce anything but the IL in the example above.
The less-than and greater-than signs are there to ensure that you cannot by accident create a type and/or method that matches what the compiler produced for you.
So that's basically what an anonymous method is.
So what is this?
Test(() => Debug.WriteLine("Test"));
Well, in this case it's the same, it's a shortcut for producing an anonymous method.
You can write this in two ways:
() => { ... code here ... }
() => ... single expression here ...
In its first form you can write all the code you would do in a normal method body. In its second form you're allowed to write one expression or statement.
However, in this case the compiler will treat this:
() => ...
the same way as this:
delegate { ... }
They're still anonymous methods, it's just that the () =>
syntax is a shortcut to getting to it.
So if it's a shortcut to getting to it, why do we have it?
Well, it makes life a bit easier for the purpose of which it was added, which is LINQ.
Consider this LINQ statement:
var customers = from customer in db.Customers
where customer.Name == "ACME"
select customer.Address;
This code is rewritten as follows:
var customers =
db.Customers
.Where(customer => customer.Name == "ACME")
.Select(customer => customer.Address");
If you were to use the delegate { ... }
syntax, you would have to rewrite the expressions with return ...
and so on, and they'd look more funky. The lambda syntax was thus added to make life easier for us programmers when writing code like the above.
So what are expressions?
So far I have not shown how Test
has been defined, but let's define Test
for the above code:
public void Test(Action action)
This should suffice. It says that "I need a delegate, it is of type Action (taking no parameters, returning no values)".
However, Microsoft also added a different way to define this method:
public void Test(Expression> expr)
Note that I dropped a part there, the ....
part, let's get back to that 1.
This code, paired with this call:
Test(() => x + 10);
will not actually pass in a delegate, nor anything that can be called (immediately). Instead, the compiler will rewrite this code to something similar (but not at all like) the below code:
var operand1 = new VariableReferenceOperand("x");
var operand2 = new ConstantOperand(10);
var expression = new AdditionOperator(operand1, operand2);
Test(expression);
Basically the compiler will build up an Expression>
object, containing references to the variables, the literal values, the operators used, etc. and pass that object tree to the method.
Why?
Well, consider the db.Customers.Where(...)
part above.
Wouldn't it be nice if, instead of downloading all customers (and all their data) from the database to the client, looping through them all, finding out which customer has the right name, etc. the code would actually ask the database to find that single, correct, customer at once?
That's the purpose behind expression. The Entity Framework, Linq2SQL, or any other such LINQ-supporting database layer, will take that expression, analyze it, pick it apart, and write up a properly formatted SQL to be executed against the database.
This it could never do if we were still giving it delegates to methods containing IL. It can only do this because of a couple of things:
- The syntax allowed in a lambda expression suitable for an
Expression>
is limited (no statements, etc.)
- The lambda syntax without the curly brackets, which tells the compiler that this is a simpler form of code
So, let's summarize:
- Anonymous methods are really not all that anonymous, they end up as a named type, with a named method, only you do not have to name those things yourself
- It's a lot of compiler magic under the hood that moves things around so that you don't have to
- Expressions and Delegates are two ways to look at some of the same things
- Expressions are meant for frameworks that wants to know what the code does and how, so that they can use that knowledge to optimize the process (like writing a SQL statement)
- Delegates are meant for frameworks that are only concerned about being able to call the method
Footnotes:
The ....
part for such a simple expression is meant for the type of return value you get from the expression. The () => ... simple expression ...
only allows expressions, that is, something that returns a value, and it cannot be multiple statements. As such, a valid expression type is this: Expression>
, basically, the expression is a function (method) returning an integer value.
Note that the "expression that returns a value" is a limit for Expression<...>
parameters or types, but not of delegates. This is entirely legal code if the parameter type of Test
is an Action
:
Test(() => Debug.WriteLine("Test"));
Obviously, Debug.WriteLine("Test")
doesn't return anything, but this is legal. If the method Test
required an expression however, it would not be, as an expression must return a value.