Let's have a small example to exermine the behavior. We create a small console app which has a "string generator". We will implement the generator with the yield
keyword.
That means we are getting our strings "one by one" and not the whole package at once. To spice things up we break out of the loop when we received two elements from the generator.
using System;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
var loopCounter = 0;
Console.WriteLine("Calling generator");
var strings = Generator();
Console.WriteLine("Getting items from generator");
foreach (var elem in strings)
{
Console.WriteLine($"From generator: {elem}");
loopCounter++;
if (loopCounter == 2)
break;
}
}
private static IEnumerable<string> Generator()
{
Console.WriteLine("Begin generator");
Console.WriteLine("Giving first string to consumer");
yield return "Hello";
Console.WriteLine("Giving second string to consumer");
yield return "World";
Console.WriteLine("Giving last string to consumer");
yield return "last";
Console.WriteLine("Done");
}
}
Which gives the following output:
Calling generator
Getting items from generator
Begin generator
Giving first string to consumer
From generator: Hello
Giving second string to consumer
From generator: World
Now how would the whole process look like if we wouldn't use the yield
keyword and return instead the whole list at once?
using System;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
var loopCounter = 0;
Console.WriteLine("Calling generator");
var strings = Generator();
Console.WriteLine("Getting items from generator");
foreach(var elem in strings)
{
Console.WriteLine($"From generator: {elem}");
loopCounter++;
if(loopCounter == 2)
break;
}
}
private static IList<string> Generator()
{
List<string> gen = new();
Console.WriteLine("Begin generator");
Console.WriteLine("Adding first string to consumer");
gen.Add("Hello");
Console.WriteLine("Adding second string to consumer");
gen.Add("World");
Console.WriteLine("Adding last string to consumer");
gen.Add("last");
Console.WriteLine("Done");
return gen;
}
}
With the following output:
Calling generator
Begin generator
Adding first string to consumer
Adding second string to consumer
Adding last string to consumer
Done
Getting items from generator
From generator: Hello
From generator: World
Key differences between yielded IEnumerable
and a List
We saw some differences in the output so let's wrap that up:
- The generator block is not exectued until we reach the consumer asking for the first element (we saw the
var string = Generator()
will not lead to any console output) - The executions of the generator method stops at the yield boundaries
- If we don't consume more items we don't reach the end of the generator block
Why does that matter?
Well I guess almost all of you used the Any()
function of LINQ, or? If we have an enumeration of items and we wan't to check if any of those fulfills our predicate.
var numbers = new[] { 1, 2, 3, 4 };
var hasPositiveValues = numbers.Any(n => n > 0);
What happens here is that we basically only have to check the first element of numbers
and we know the requirements is fulfilled and we can safely return true
.
Of course we are operating on data which lives in the RAM. No big deal. But just imagine we consume an interface which might or might not be expensive:
IEnumerable<int> numbers = SomeSlowOperation();
var hasPositiveValues = numbers.Any(n => n > 0);
Now we only enumerate one by one and can safe potentially a lot of time and resource. In the worst case we have to go through the whole list anyway. If your data type would be IList<int>
the whole things would get materialized at once instead even if we would have needed only the first element (I look at you Take(1)
/ First()
).
Of course there is also a pitfall:
IEnumerable<int> numbers = SomeSlowOperation();
var evenNumbers = numbers.Where(n => n % 2 == 0);
var oddNumbers = numbers.Where(n => n % 2 == 1);
Now the whole generator is executed twice. So be aware of multiple enumerations of IEnumerable
.
So how does that work now?
And what does it have to do with async
/await
? The answer is simple: both create a state machine.
So let's have a look how (simplified) the state machine works:
- Our
yield
function is basically slices into multiple state cutted directly at theyield
border. - Now if a consumer asks for one value, the state machine will execute everything until the first
yield
, set the state-machine for the next call and will return the value - The consumer once more asks for another value. The state machine will jump into the next state to return the value behind the second
yield
and again will set the state-machine and return the value.
How does that look in action?
Every yield
/ Generator function will be it's own class representing the state machine. Very brief it looks like this:
public IEnumerable<string> GetStrings()
{
Console.WriteLine("Enter function");
yield return "First";
Console.WriteLine("Another");
yield return "Second";
}
Will become this:
public GetStringsStateMachine : IEnumerable<string>, IEnumerator<string>
{
private int currentState; // Represents the state which will be executed when we go into the function again
private string currentValue; // Represents the returned value
public GetStringsStateMachine()
{
}
IEnumerator<string>.Current => currentValue; // This is how a consumer (like foreach) would ask for a value.
private bool MoveNext()
{
switch (currentState)
{
default:
return false;
case 0:
Console.WriteLine("Enter function");
currentValue = "First";
currentState = 1; // Go to the next step in the state machine
return true; // True means we have further values and or not done
case 1:
Console.WriteLine("Another");
currentValue = "Second";
return false; // False means we are done. There are no further values
}
}
}
We see that we basically split the function at the yield
boundary with its content inside some switch-cases. A simple state machine.
The state machine implements IEnumerator
and IEnumerable
itself to use the "magic" of MoveNext()
. If we have items our state machine will put its state to the next level, saves the current value so it can be exposed to the outside world and returns true, if there are more elements to come or false.
Now let's have a look at a typical consumer:
foreach (var item in GetStrings())
{
// ...
}
Will become this:
IEnumerator<string> enumerator = GetStrings().GetEnumerator();
while (enumerator.MoveNext())
{
string current = enumerator.Current;
}
private static IEnumerable<string> GetStrings()
{
return new GetStringsStateMachine();
}
Now let's unwrap that.
- We are creating the state machine. We are calling the
GetStrings()
method to initialize the state machine. But wait a second? How can the compiler generate the same method with the same name? Well he doesn't. We saw earlier that the compiler generates this whole state machine. And what is left in your original method is exactly that. The initialization of your state-machine. - As we are getting the enumerator of the state machine we are practically set.
while
we have items (rememberMoveNext
returns true as long as we still have items) do our loop- To get the current item we use
IEnumerator<string>.Current
to get this item. Very straight forward
Now. As you can imagine that is not the whole truth. This is a bit more complicated. First just imagine we handle different threads here? Or we have exceptions? The real state machine is a bit more complex. But you will find all the principles I showed there as well. If you want to have a look at it, have a look on sharplab.io. If you don't know the site I highly recommend using it and reading about it. It shows you what your compiler does with your code.