Generator-Function in C# - What does yield do?

16/12/2021

Let's have a small example to exermine the behavior. We create a small console app which has a "string generator". We will implement the generator with the yield keyword. That means we are getting our strings "one by one" and not the whole package at once. To spice things up we break out of the loop when we received two elements from the generator.

using System;
using System.Collections.Generic;					

public class Program
{
    public static void Main()
    {
        var loopCounter = 0;
        Console.WriteLine("Calling generator");
        var strings = Generator();
        Console.WriteLine("Getting items from generator");
        foreach (var elem in strings)
        {
            Console.WriteLine($"From generator: {elem}");
            loopCounter++;
            
            if (loopCounter == 2)
                break;
        }				
    }
    
    private static IEnumerable<string> Generator()
    {
        Console.WriteLine("Begin generator");
        Console.WriteLine("Giving first string to consumer");
        yield return "Hello";
        Console.WriteLine("Giving second string to consumer");
        yield return "World";
        Console.WriteLine("Giving last string to consumer");
        yield return "last";
        Console.WriteLine("Done");
    }
}

Which gives the following output:

Calling generator
Getting items from generator
Begin generator
Giving first string to consumer
From generator: Hello
Giving second string to consumer
From generator: World

Now how would the whole process look like if we wouldn't use the yield keyword and return instead the whole list at once?

using System;
using System.Collections.Generic;					

public class Program
{
    public static void Main()
    {
        var loopCounter = 0;
        Console.WriteLine("Calling generator");
        var strings = Generator();
        Console.WriteLine("Getting items from generator");
        foreach(var elem in strings)
        {
            Console.WriteLine($"From generator: {elem}");
            loopCounter++;
            
            if(loopCounter == 2)
                break;
        }				
    }
    
    private static IList<string> Generator()
    {
        List<string> gen = new();
        Console.WriteLine("Begin generator");
        Console.WriteLine("Adding first string to consumer");
        gen.Add("Hello");
        Console.WriteLine("Adding second string to consumer");
        gen.Add("World");
        Console.WriteLine("Adding last string to consumer");
        gen.Add("last");
        Console.WriteLine("Done");
        return gen;
    }
}

With the following output:

Calling generator
Begin generator
Adding first string to consumer
Adding second string to consumer
Adding last string to consumer
Done
Getting items from generator
From generator: Hello
From generator: World

Key differences between yielded IEnumerable and a List

We saw some differences in the output so let's wrap that up:

  • The generator block is not exectued until we reach the consumer asking for the first element (we saw the var string = Generator() will not lead to any console output)
  • The executions of the generator method stops at the yield boundaries
  • If we don't consume more items we don't reach the end of the generator block

Why does that matter?

Well I guess almost all of you used the Any() function of LINQ, or? If we have an enumeration of items and we wan't to check if any of those fulfills our predicate.

var numbers = new[] { 1, 2, 3, 4 };
var hasPositiveValues = numbers.Any(n => n > 0);

What happens here is that we basically only have to check the first element of numbers and we know the requirements is fulfilled and we can safely return true. Of course we are operating on data which lives in the RAM. No big deal. But just imagine we consume an interface which might or might not be expensive:

IEnumerable<int> numbers = SomeSlowOperation();
var hasPositiveValues = numbers.Any(n => n > 0);

Now we only enumerate one by one and can safe potentially a lot of time and resource. In the worst case we have to go through the whole list anyway. If your data type would be IList<int> the whole things would get materialized at once instead even if we would have needed only the first element (I look at you Take(1) / First()). Of course there is also a pitfall:

IEnumerable<int> numbers = SomeSlowOperation();

var evenNumbers = numbers.Where(n => n % 2 == 0);
var oddNumbers = numbers.Where(n => n % 2 == 1);

Now the whole generator is executed twice. So be aware of multiple enumerations of IEnumerable.

So how does that work now?

And what does it have to do with async/await? The answer is simple: both create a state machine. So let's have a look how (simplified) the state machine works:

state_machine

  1. Our yield function is basically slices into multiple state cutted directly at the yield border.
  2. Now if a consumer asks for one value, the state machine will execute everything until the first yield, set the state-machine for the next call and will return the value
  3. The consumer once more asks for another value. The state machine will jump into the next state to return the value behind the second yield and again will set the state-machine and return the value.

How does that look in action?

Every yield / Generator function will be it's own class representing the state machine. Very brief it looks like this:

public IEnumerable<string> GetStrings()
{
    Console.WriteLine("Enter function");
    yield return "First";
    Console.WriteLine("Another");
    yield return "Second";
}

Will become this:

public GetStringsStateMachine : IEnumerable<string>, IEnumerator<string>
{
    private int currentState;    // Represents the state which will be executed when we go into the function again
    private string currentValue; // Represents the returned value

    public GetStringsStateMachine()
    {
    }

    IEnumerator<string>.Current => currentValue; // This is how a consumer (like foreach) would ask for a value.

    private bool MoveNext()
    {
        switch (currentState)
        {
            default:
                return false;
            case 0:
                Console.WriteLine("Enter function");
                currentValue = "First";
                currentState = 1; // Go to the next step in the state machine
                return true; // True means we have further values and or not done
            case 1:
                Console.WriteLine("Another");
                currentValue = "Second";
                return false; // False means we are done. There are no further values
        }
    }
}

We see that we basically split the function at the yield boundary with its content inside some switch-cases. A simple state machine. The state machine implements IEnumerator and IEnumerable itself to use the "magic" of MoveNext(). If we have items our state machine will put its state to the next level, saves the current value so it can be exposed to the outside world and returns true, if there are more elements to come or false.

Now let's have a look at a typical consumer:

foreach (var item in GetStrings())
{
    // ...
}

Will become this:

IEnumerator<string> enumerator = GetStrings().GetEnumerator();
while (enumerator.MoveNext())
{
    string current = enumerator.Current;
}

private static IEnumerable<string> GetStrings()
{
    return new GetStringsStateMachine();
}

Now let's unwrap that.

  1. We are creating the state machine. We are calling the GetStrings() method to initialize the state machine. But wait a second? How can the compiler generate the same method with the same name? Well he doesn't. We saw earlier that the compiler generates this whole state machine. And what is left in your original method is exactly that. The initialization of your state-machine.
  2. As we are getting the enumerator of the state machine we are practically set.
  3. while we have items (remember MoveNext returns true as long as we still have items) do our loop
  4. To get the current item we use IEnumerator<string>.Current to get this item. Very straight forward

Now. As you can imagine that is not the whole truth. This is a bit more complicated. First just imagine we handle different threads here? Or we have exceptions? The real state machine is a bit more complex. But you will find all the principles I showed there as well. If you want to have a look at it, have a look on sharplab.io. If you don't know the site I highly recommend using it and reading about it. It shows you what your compiler does with your code.

2
Buy Me a Coffee at ko-fi.com
An error has occurred. This application may no longer respond until reloaded. Reload x