In the How Can I Expose Only a Fragment of IList<> question one of the answers had the following code snippet:
IEnumerable
The yield
keyword allows you to create an IEnumerable<T>
in the form on an iterator block. This iterator block supports deferred executing and if you are not familiar with the concept it may appear almost magical. However, at the end of the day it is just code that executes without any weird tricks.
An iterator block can be described as syntactic sugar where the compiler generates a state machine that keeps track of how far the enumeration of the enumerable has progressed. To enumerate an enumerable, you often use a foreach
loop. However, a foreach
loop is also syntactic sugar. So you are two abstractions removed from the real code which is why it initially might be hard to understand how it all works together.
Assume that you have a very simple iterator block:
IEnumerable<int> IteratorBlock()
{
Console.WriteLine("Begin");
yield return 1;
Console.WriteLine("After 1");
yield return 2;
Console.WriteLine("After 2");
yield return 42;
Console.WriteLine("End");
}
Real iterator blocks often have conditions and loops but when you check the conditions and unroll the loops they still end up as yield
statements interleaved with other code.
To enumerate the iterator block a foreach
loop is used:
foreach (var i in IteratorBlock())
Console.WriteLine(i);
Here is the output (no surprises here):
Begin 1 After 1 2 After 2 42 End
As stated above foreach
is syntactic sugar:
IEnumerator<int> enumerator = null;
try
{
enumerator = IteratorBlock().GetEnumerator();
while (enumerator.MoveNext())
{
var i = enumerator.Current;
Console.WriteLine(i);
}
}
finally
{
enumerator?.Dispose();
}
In an attempt to untangle this I have crated a sequence diagram with the abstractions removed:
The state machine generated by the compiler also implements the enumerator but to make the diagram more clear I have shown them as separate instances. (When the state machine is enumerated from another thread you do actually get separate instances but that detail is not important here.)
Every time you call your iterator block a new instance of the state machine is created. However, none of your code in the iterator block is executed until enumerator.MoveNext()
executes for the first time. This is how deferred executing works. Here is a (rather silly) example:
var evenNumbers = IteratorBlock().Where(i => i%2 == 0);
At this point the iterator has not executed. The Where
clause creates a new IEnumerable<T>
that wraps the IEnumerable<T>
returned by IteratorBlock
but this enumerable has yet to be enumerated. This happens when you execute a foreach
loop:
foreach (var evenNumber in evenNumbers)
Console.WriteLine(eventNumber);
If you enumerate the enumerable twice then a new instance of the state machine is created each time and your iterator block will execute the same code twice.
Notice that LINQ methods like ToList()
, ToArray()
, First()
, Count()
etc. will use a foreach
loop to enumerate the enumerable. For instance ToList()
will enumerate all elements of the enumerable and store them in a list. You can now access the list to get all elements of the enumerable without the iterator block executing again. There is a trade-off between using CPU to produce the elements of the enumerable multiple times and memory to store the elements of the enumeration to access them multiple times when using methods like ToList()
.
yield return
is used with enumerators. On each call of yield statement, control is returned to the caller but it ensures that the callee's state is maintained. Due to this, when the caller enumerates the next element, it continues execution in the callee method from statement immediately after the yield
statement.
Let us try to understand this with an example. In this example, corresponding to each line I have mentioned the order in which execution flows.
static void Main(string[] args)
{
foreach (int fib in Fibs(6))//1, 5
{
Console.WriteLine(fib + " ");//4, 10
}
}
static IEnumerable<int> Fibs(int fibCount)
{
for (int i = 0, prevFib = 0, currFib = 1; i < fibCount; i++)//2
{
yield return prevFib;//3, 9
int newFib = prevFib + currFib;//6
prevFib = currFib;//7
currFib = newFib;//8
}
}
Also, the state is maintained for each enumeration. Suppose, I have another call to Fibs()
method then the state will be reset for it.
The C# yield keyword, to put it simply, allows many calls to a body of code, referred to as an iterator, that knows how to return before it's done and, when called again, continues where it left off - i.e. it helps an iterator become transparently stateful per each item in a sequence that the iterator returns in successive calls.
In JavaScript, the same concept is called Generators.
At first sight, yield return is a .NET sugar to return an IEnumerable.
Without yield, all the items of the collection are created at once:
class SomeData
{
public SomeData() { }
static public IEnumerable<SomeData> CreateSomeDatas()
{
return new List<SomeData> {
new SomeData(),
new SomeData(),
new SomeData()
};
}
}
Same code using yield, it returns item by item:
class SomeData
{
public SomeData() { }
static public IEnumerable<SomeData> CreateSomeDatas()
{
yield return new SomeData();
yield return new SomeData();
yield return new SomeData();
}
}
The advantage of using yield is that if the function consuming your data simply needs the first item of the collection, the rest of the items won't be created.
The yield operator allows the creation of items as it is demanded. That's a good reason to use it.
A list or array implementation loads all of the items immediately whereas the yield implementation provides a deferred execution solution.
In practice, it is often desirable to perform the minimum amount of work as needed in order to reduce the resource consumption of an application.
For example, we may have an application that process millions of records from a database. The following benefits can be achieved when we use IEnumerable in a deferred execution pull-based model:
Here is a comparison between build a collection first such as a list compared to using yield.
List Example
public class ContactListStore : IStore<ContactModel>
{
public IEnumerable<ContactModel> GetEnumerator()
{
var contacts = new List<ContactModel>();
Console.WriteLine("ContactListStore: Creating contact 1");
contacts.Add(new ContactModel() { FirstName = "Bob", LastName = "Blue" });
Console.WriteLine("ContactListStore: Creating contact 2");
contacts.Add(new ContactModel() { FirstName = "Jim", LastName = "Green" });
Console.WriteLine("ContactListStore: Creating contact 3");
contacts.Add(new ContactModel() { FirstName = "Susan", LastName = "Orange" });
return contacts;
}
}
static void Main(string[] args)
{
var store = new ContactListStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection.");
Console.ReadLine();
}
Console Output
ContactListStore: Creating contact 1
ContactListStore: Creating contact 2
ContactListStore: Creating contact 3
Ready to iterate through the collection.
Note: The entire collection was loaded into memory without even asking for a single item in the list
Yield Example
public class ContactYieldStore : IStore<ContactModel>
{
public IEnumerable<ContactModel> GetEnumerator()
{
Console.WriteLine("ContactYieldStore: Creating contact 1");
yield return new ContactModel() { FirstName = "Bob", LastName = "Blue" };
Console.WriteLine("ContactYieldStore: Creating contact 2");
yield return new ContactModel() { FirstName = "Jim", LastName = "Green" };
Console.WriteLine("ContactYieldStore: Creating contact 3");
yield return new ContactModel() { FirstName = "Susan", LastName = "Orange" };
}
}
static void Main(string[] args)
{
var store = new ContactYieldStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection.");
Console.ReadLine();
}
Console Output
Ready to iterate through the collection.
Note: The collection wasn't executed at all. This is due to the "deferred execution" nature of IEnumerable. Constructing an item will only occur when it is really required.
Let's call the collection again and obverse the behaviour when we fetch the first contact in the collection.
static void Main(string[] args)
{
var store = new ContactYieldStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection");
Console.WriteLine("Hello {0}", contacts.First().FirstName);
Console.ReadLine();
}
Console Output
Ready to iterate through the collection
ContactYieldStore: Creating contact 1
Hello Bob
Nice! Only the first contact was constructed when the client "pulled" the item out of the collection.
It's producing enumerable sequence. What it does is actually creating local IEnumerable sequence and returning it as a method result