Some time back I wrote about "async2 - The .NET Runtime Async experiment concludes" basically moving the async state machine into the runtime. Now with .NET 11 we are seeing the fruits of that labor.
What was it about?
As you may or may not know, the async and await keywords in C# are implemented using a state machine that the compiler generates. So the JIT (Just-In-Time compiler) has to no idea about async and await itself. Making it a runtime feature means we have more dedicated types and "things" in the runtime so it can deal with it more efficiently. For example: If you have local variables in an async method in .NET 10, the state machine will hoist them to fields of the generated state machine class. In .NET 11, the runtime can keep them on the stack and only spill them to the heap if they are actually needed (across the await boundaries). So we can expect better performance and less allocations in many cases.
Let's see some benchmarks to get a better idea of the improvements.
DISCLAIMER: As always with benchmarks, take them with a grain of salt. They are synthetic and may not represent real-world scenarios yadda yada. You know the drill! As described later the runtime itself is still not fully "runtime-async" enabled so those numbers will change!
[MemoryDiagnoser]
public class AsyncBenchmark
{
[Benchmark]
[MethodImpl(MethodImplOptions.NoInlining)]
public async Task<string> GetDataFrom()
{
List<string> data = new List<string>(10);
for (int i = 0; i < 10; i++)
{
data.Add(await GetResult());
}
return data[9];
}
private static async Task<string> GetResult()
{
return await Task.FromResult("Data from async method");
}
}
Here with the new version:
| Method | Mean | Error | StdDev | Gen0 | Allocated |
|------------ |---------:|--------:|--------:|-------:|----------:|
| GetDataFrom | 128.0 ns | 1.19 ns | 1.06 ns | 0.1109 | 928 B |
In comparison to the old version:
| Method | Mean | Error | StdDev | Gen0 | Allocated |
|------------ |---------:|--------:|--------:|-------:|----------:|
| GetDataFrom | 192.4 ns | 2.30 ns | 1.92 ns | 0.1969 | 1.61 KB |
Both are running .NET 11 preview 1. To "enable" the new way, you need the .NET 11 SDK (daahhhh) and enable this in your csproj under the PropertyGroup:
<EnablePreviewFeatures>true</EnablePreviewFeatures>
<Features>$(Features);runtime-async=on</Features>
In real-world scenarions this is almost negligible, as all the libraries itself are currently runtime-async=off and therefore have to be recompiled to be able to use the new features. The benchmarl above is synthetic in a way that Task.FromResult is already completed and the point of optimization is the GetResult method itself. That is all.But it gives an indication.
Lowering
To understand what is going on, let's have a look at the lowered code. Here the new version:
[NullableContext(1)]
[Nullable(0)]
[MemoryDiagnoser(true)]
public class AsyncBenchmark
{
[Benchmark(12, "/Users/stevengiesel/repos/11Perf/11Perf/Program.cs")]
[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.Async)]
public Task<string> GetDataFrom()
{
List<string> data = new List<string>(10);
for (int i = 0; i < 10; ++i)
data.Add(AsyncHelpers.Await<string>(AsyncBenchmark.GetResult()));
return (Task<string>) data[9];
}
[MethodImpl(MethodImplOptions.Async)]
private static Task<string> GetResult()
{
return (Task<string>) AsyncHelpers.Await<string>(Task.FromResult<string>("Data from async method"));
}
public AsyncBenchmark()
{
base..ctor();
}
}
If we have a look that looks "almost" like our original code! Let's compare that to the old version:
[NullableContext(1)]
[Nullable(0)]
[MemoryDiagnoser(true)]
public class AsyncBenchmark
{
[AsyncStateMachine(typeof (AsyncBenchmark.<GetDataFrom>d__0))]
[Benchmark(12, "/Users/stevengiesel/repos/11Perf/11Perf/Program.cs")]
[MethodImpl(MethodImplOptions.NoInlining)]
public Task<string> GetDataFrom()
{
AsyncBenchmark.<GetDataFrom>d__0 stateMachine;
stateMachine.<>t__builder = AsyncTaskMethodBuilder<string>.Create();
stateMachine.<>1__state = -1;
stateMachine.<>t__builder.Start<AsyncBenchmark.<GetDataFrom>d__0>(ref stateMachine);
return stateMachine.<>t__builder.Task;
}
[AsyncStateMachine(typeof (AsyncBenchmark.<GetResult>d__1))]
private static Task<string> GetResult()
{
AsyncBenchmark.<GetResult>d__1 stateMachine;
stateMachine.<>t__builder = AsyncTaskMethodBuilder<string>.Create();
stateMachine.<>1__state = -1;
stateMachine.<>t__builder.Start<AsyncBenchmark.<GetResult>d__1>(ref stateMachine);
return stateMachine.<>t__builder.Task;
}
public AsyncBenchmark()
{
base..ctor();
}
[CompilerGenerated]
[StructLayout(LayoutKind.Auto)]
private struct <GetDataFrom>d__0 :
/*[Nullable(0)]*/
IAsyncStateMachine
{
//Here now more 100 lines of code for the state machine
Just the amount of code alone is a good indicator of the complexity of the old version. The new version is much more straightforward and easier to read.
The magic is AsyncHelpers.Await here! AsyncHelpers is the orchestrator here. Imagine them as a bridge between the task-like object and the runtime execution.
The core principles are still the same. We still have a "suspend" point at the await and we still have to deal with the state machine in some way.
Exceptions
If you have an exception in an async method, like:
await MyTask();
async Task<string> MyTask()
{
await Task.Delay(1);
throw new InvalidOperationException("I am an Exception. Bi bub");
return "";
}
You can see the different call stacks in the old and new version. Here the new version:
Unhandled exception. System.InvalidOperationException: This is an exception.
at Program.ExceptionThrowingMethod() in /Users/stevengiesel/repos/11Perf/11Perf/Program.cs:line 13
at System.Runtime.CompilerServices.AsyncHelpers.RuntimeAsyncTask`1.DispatchContinuations()
--- End of stack trace from previous location ---
at Program.<Main>()
The old version:
Unhandled exception. System.InvalidOperationException: This is an exception.
at Program.ExceptionThrowingMethod() in /Users/stevengiesel/repos/11Perf/11Perf/Program.cs:line 13
at Program.Main() in /Users/stevengiesel/repos/11Perf/11Perf/Program.cs:line 7
at Program.<Main>()
We have a new DispatchContinuations method in the call stack.
SynchronizationContext
Oh boy - our friend the SynchronizationContext! That thing we carry around via ConfigureAwait(true) which happens implicitly if you don't specify it (aka ConfigureAwait(false)). The SynchronizationContext is responsible for capturing the current context (like the UI thread in a WPF application) and making sure that after an await, we continue on the same context. Kind of a snapshot of where we are in the code and making sure we come back there after the async operation is done.
We might see a change in the behavior of SynchronizationContext with the new runtime async. In the old version, the state machine would capture the SynchronizationContext and use it to post the continuation back to the original context. In the new version, the runtime might drop the SynchronizationContext. I can't say for sure now as the core libraries are still using the old approach so any attempt will result in the old behavior.
So there will be a part 2 once some of the core libraries are enabled and we can see potential differences in AsyncLocal, ExecutionContext and friends. Something library maintainer may have to be aware off!
