C# Lowering

26/01/2023
.NETC#

Did you ever hear the word "compiler magic" or "syntactic sugar"? Probably yes and therefore we want to dissect what this "magic" really is!

We can see how we can predict performance or bugs by "lowering" our code. Also we will see how things like foreach, var, lock, using, async, await, yield, anonymous types, record, stackalloc, pattern matching, Blazor components, deconstructor, extension methods... do not really exist.

What does those two have in common?

Pikachu

and

foreach (int number in myList)

And first glance not much - but if we ask the C# compiler who takes your code and turns it into assembly code does not neither of them. Now that we are off to a good start let's see what Lowering really is.

Lowering

You probably know what a compiler is. It takes your code and translates it into another language more often than not something which is closer to the underlying hardware. Lowering is the process of translating high level features into low level features in the same language.

Lowering

This process can involve converting certain language features or constructs, such as foreach loops, into equivalent constructs that are more easily executed by the runtime.

The reason why constructs like foreach don't exist anymore after the lowering process is that when the C# compiler converts the foreach loop into lower-level code, it essentially rewrites the loop using a combination of other constructs, such as for loops, if statements, and other language features. This allows the runtime to more easily execute the loop, but at the expense of the original high-level foreach construct.

Benefits

There are certain benefits in combination with lowering:

  • Improved performance: Simple constructs are easier to optimize. For example, a for loop can be "unrolled" (Loop unrolling is a technique in computer programming where a loop's iteration is divided into several smaller, identical loops to improve performance by reducing the overhead of loop control instructions.).
  • Simplified language design: Not only for you, but it can make the lives easier for the language team. Just imagine the record type. It is "nothing else" than a fancy class that also implements IEquatable. By decoupling high and low-level features the team can simplify a lot of boilerplate code for you.

Lowering happens as part of your compiler pipeline. So when you hit the build button or call dotnet build things like semantics and syntax analysis kick in and if they don't show any errors your code "gets transformed" to CIL code. And in the transformation stage also your code gets lowered. By the way, if you need a quick overview of the steps: "What is the difference between C#, .NET, IL and JIT?"

he lowering process is typically performed by the C# compiler, but can also be done by third-party tools such as JIT compilers. One of those third-party tools is called https://sharplab.io/. Here you can write source code and get it lowered to see the result. That is the base for the whole next chapters as we will dissect some high-level features into their low-level equivalent.

var

Let's start easy with they keyword var:

var foo = "Hello World";

This code will be translated to:

string foo = "Hello World";

As you can see the whole type interference part is done when your code gets lowered. var is not a fundamental type and gets replaced by the actual type that it should represent.

foreach Array

var range = new[] { 1, 2 };

foreach(var item in range)
    Console.Write(item);

This is a small array that gets created via a collection initializer and we enumerate it afterward. How does this code look like when it gets lowered?

int[] array = new int[2];
array[0] = 1;
array[1] = 2;
int[] array2 = array;
int num = 0;
while (num < array2.Length)
{
    int value = array2[num];
    Console.Write(value);
    num++;
}

Huh? It's gone!? Just like our Pikachu, the foreach is no more. It got replaced by a while loop where we use the indexer and an integer variable that gets incremented until we are at the end of our array. Another observation is that our collection initializer is gone as well. The array size gets interfered with our usage and the elements will be set one by one. You might have noticed an odd it[] array2 = array; call here. What is that for? I will directly quote @Naine from the comments down below - thanks for the input:

The language rules dictate that the expression in the foreach statement is evaluated exactly once. Imagine if you had foreach (var x in GetArray()), the method must be called exactly once with the returned array reference assigned to a temporary variable. Even if the expression is a simple local variable reference (being a local, it is not accessible by other threads), it is possible to reassign the variable within the loop, but the loop must continue over the array it referred to when entering the loop, hence the compiler must copy the array reference to a temporary. Of course the compiler could optimize out the temporary in the specific example shown, but this would complicate the implementation so it leaves the optimization to the JIT.

Let's check what a foreach over a List<T> looks like.

foreach List<T>

var list = new List<int> { 1, 2 };

foreach(var item in list)
    Console.Write(item);

becomes:

List<int> list = new List<int>();
list.Add(1);
list.Add(2);
List<int>.Enumerator enumerator = list.GetEnumerator();
try
{
    while (enumerator.MoveNext())
    {
        Console.Write(enumerator.Current);
    }
}
finally
{
    ((IDisposable)enumerator).Dispose();
}

Again our collection initializer is gone. Instead, we are adding elements one by one. Also here we have a while loop to go through our elements, but we don't use the indexer but MoveNext and Current. Every collect-like type in .NET implements IEnumerable. IEnumerable is an interface in C# that defines a single method called GetEnumerator which returns an object that implements the IEnumerator interface. The IEnumerator interface defines methods for iterating over a collection, such as MoveNext and Current.

MoveNext returns true if there is a next object in our enumeration and sets Current to it. With this behavior, we can abstract the way how we enumerate through something. You can use foreach with a normal List but also with a Queue, Stack, or LinkedList that all have different behavior, but the concept of IEnumerable abstracts away that implementation detail.

We can also see a try-finally block. That is because IEnumerator implements IDisposable to cleanup the enumerator once it's done. Even though we use the foreach exactly the same with arrays or lists, the underlying code is different. With that information we can also say that an array is faster to enumerate over than a List. Mainly because in an array we don't have any (virtual) function calls and no try-finally block. That doesn't mean you should refactor your code to use arrays everywhere, no, no, no. Please don't! The performance gain is very minimal at best. If you enumerate over millions of entries, then you can evaluate first and then change accordingly.

using

The using statement is another feature that gets lowered by the compiler. Let's have a look at this horrific example:

Task<string> GetContentFromUrlAsync(string url)
{
    // Don't do this! Creating new HttpClients
    // is expensive and has other caveats
    // This is for the sake of demonstration
    using var client = new HttpClient();
        return client.GetStringAsync(url);
}

We create an IDisposable and call an asynchronous method without awaiting it. What could go wrong here? I earlier said we can detect bugs, so let's have a look at the lowered code:

HttpClient httpClient = new HttpClient();
try
{
    return httpClient.GetStringAsync(url);
}
finally
{
    if (httpClient != null)
    {
        ((IDisposable)httpClient).Dispose();
    }
}

using is a fancy way of saying: "Hey wrap my IDisposable in a try finally block to ensure that Dispose gets called. The problem here is how asynchronous methods work. Inside GetStringAsync as soon as we hit the first await the method returns to the caller, so our code and we return the Task object immediately. After return is done our finally block gets called and the IDisposable gets disposed of, but wait, we are not done with the download yet!? Exactly here is where the problem starts. We have a disposed element we try to access later once we are going back to the await part inside GetStringAsync. In this case we get an exception, more abstract. This is more or less dangerous and often fails or ends in undefined behavior. The easy fix here is await the GetStringAsync call! There are very very few reasons not to use await.

Conclusion

Lowering is an essential part of the language and even if you don't "feel" it, it impacts you every time you write some code! It is a nice trick to offer high-class features that fit easily into the ecosystem!

Resources

18
An error has occurred. This application may no longer respond until reloaded. Reload x