.NET 9 LINQ Performance Edition

As with almost every edition of .NET, the team has been working on improving performance. In this blog post, we will see some improvements to the related tickets and benchmarks.

Disclaimer: I am running those benchmarks on a MacBook M2 Pro (arm64) with MacOS 14.4. That said, run those tests always against your real target environment. As seen in this issue: https://github.com/dotnet/runtime/issues/101437, the performance can vary a lot between different environments and platforms.

1. OrderBy.ToList enhancements

Let's have a look at the Benchmark:

private static readonly Random Random = new(42);
private static readonly List<int> _list = Enumerable.Range(0, 10_000).Select(_ => Random.Next(1000)).ToList();

[Benchmark]
public List<int> OrderBy() => _list.OrderBy(s => s).ToList();

We create a random list (but stable and deterministic overmultiple runs) and call OrderBy followed by ToList. The benchmark is run with 10,000 elements. Results:

| Method     | Job      | Runtime  | Mean      | Error     | StdDev    | Ratio | RatioSD | Gen0    | Gen1   | Allocated | Alloc Ratio |
|----------- |--------- |--------- |----------:|----------:|----------:|------:|--------:|--------:|-------:|----------:|------------:|
| OrderBy    | .NET 8.0 | .NET 8.0 | 779.89 us |  7.987 us |  7.471 us |  1.00 |    0.00 | 18.5547 | 0.9766 | 156.55 KB |        1.00 |
| OrderBy    | .NET 9.0 | .NET 9.0 | 747.02 us | 11.413 us | 10.675 us |  0.96 |    0.02 | 18.5547 |      - | 156.58 KB |        1.00 |

Why is this faster with net9? Let's have a look at the diff:

-for (int i = 0; i < map.Length; i++)
-{
-    map[i] = i;
-}
+int[] map = new int[count];
+FillIncrementing(map, 0);

Source: https://github.com/dotnet/runtime/pull/99538

This method is called when you call ToList on an OrderBy result. The important part here is that FillIncrementing utilizes Vectors if the hardware is capable off. Vector is used for SIMD - Single Instruction, Multiple Data. This means that the CPU can execute the same instruction on multiple data points at the same time. This is especially useful for operations like FillIncrementing where we just fill an array with incrementing values. So instead of writing one value at a time, we can write multiple values at once saving a lot of time.

If you are interested in SIMD and how it works, I recommend my blog post: "LINQ on Steroids"

2. Chunk improvements for Array<T>

The next on is a specialization for Array<T> in the Chunk method. Let's have a look at the benchmark:

private static readonly int[] _array = Enumerable.Range(0, 10_000).Select(_ => Random.Next(1000)).ToArray();

[Benchmark]
public int Chunk() => _array.Chunk(10).Count();

Results:

| Method | Job      | Runtime  | Mean      | Error     | StdDev    | Ratio | Gen0   | Allocated | Alloc Ratio |
|------- |--------- |--------- |----------:|----------:|----------:|------:|-------:|----------:|------------:|
| Chunk  | .NET 8.0 | .NET 8.0 | 26.685 us | 0.1933 us | 0.1713 us |  1.00 | 7.6599 |   62.7 KB |        1.00 |
| Chunk  | .NET 9.0 | .NET 9.0 |  7.319 us | 0.0595 us | 0.0556 us |  0.27 | 7.6523 |  62.56 KB |        1.00 |

That is impressive! The Chunk method is now 3.6x faster than in .NET 8.0. By the way - that does not translate to List<T>. The Chunk method is specialized for Array<T> only (as of preivew 3 of .net9).

The performance improvement stems from ReadOnlySpan. Have a look here: https://source.dot.net/#System.Linq/System/Linq/Chunk.cs,61 vs https://source.dot.net/#System.Linq/System/Linq/Chunk.cs,72

The latter one is used for all the other types (if not empty). Since the introduction of Span and ReadOnlySpan the team is using it more and more - for a good reason: It can improve the performance a lot. But be warned as there are also many caveats. Here a small overview if you need a refresher: Create a low allocation and faster StringBuilder - Span in Action.

The PR in question can be found here: https://github.com/dotnet/runtime/pull/99538

3. OfType and Cast improvements

The last benchmark is about OfType and Cast. Let's have a look at the benchmark:

private static readonly Random Random = new(42);
private readonly List<object> _list = [];

[GlobalSetup]
public void Setup()
{
    for (var i = 0; i < 10_000; i++)
    {
        var n = Random.Next(1000) % 4;
        object objectToAdd = n switch
        {
            0 => 2,
            1 => "Hello World",
            2 => 3f,
            3 => 100d
        };
        _list.Add(objectToAdd);
    }
}

[Benchmark]
public List<int> GetAllInts() => _list.OfType<int>().ToList();

Basically we are creating a random set of different types and then filter for int values. The benchmark is run with 10,000 elements.

| Method     | Job      | Runtime  | Mean     | Error    | StdDev   | Ratio | Gen0   | Allocated | Alloc Ratio |
|----------- |--------- |--------- |---------:|---------:|---------:|------:|-------:|----------:|------------:|
| GetAllInts | .NET 8.0 | .NET 8.0 | 63.92 us | 0.491 us | 0.459 us |  1.00 | 3.9063 |  32.37 KB |        1.00 |
| GetAllInts | .NET 9.0 | .NET 9.0 | 49.28 us | 0.408 us | 0.362 us |  0.77 | 3.9063 |  32.36 KB |        1.00 |

A speedy iprovement of 1.3x. The gains are coming from the point that the dotnet team special treat certain types. Why is this important or better: What can you learn from this? Well - if you have a critical path in your code that needs to be faster, you can specialize your code for certain types or scenarios. And that is what happened here. The problem with such optimizations is that they involve (way) more code and therefore can become less readable and maintainable. So it is always a trade-off between performance and readability. Therefore don't blindly optimize your code. Always measure first and then optimize.

The PR in question: https://github.com/dotnet/runtime/pull/97956

4. Any improvements

The Any method is used to check if a sequence contains any elements. This one becomes an improvement:

private static readonly Random Random = new(42);
private static readonly List<int> _list = Enumerable.Range(0, 10_000).Select(s => Random.Next(100)).ToList();
private static readonly IEnumerable<int> _distinct = _list.Distinct();
private static readonly IEnumerable<int> _where = _list.Where(s => s % 2 == 0);

[Benchmark]
public List<int> DistinctWithoutAny() => _distinct.ToList();

[Benchmark]
public List<int> WhereWithoutAny() => _where.ToList();

[Benchmark]
public bool HasAnyDistinct() => _distinct.Any();

[Benchmark]
public bool HasAnyWhere() => _where.Any();

I added the DistinctWithoutAny and WhereWithoutAny to show that Distinct and Where are already heavily optimized in .NET 9.

| Method             | Job      | Runtime  | Mean          | Error       | StdDev        | Ratio | RatioSD | Gen0     | Gen1     | Gen2    | Allocated | Alloc Ratio |
|------------------- |--------- |--------- |--------------:|------------:|--------------:|------:|--------:|---------:|---------:|--------:|----------:|------------:|
| DistinctWithoutAny | .NET 8.0 | .NET 8.0 | 52,996.767 ns | 490.1662 ns |   434.5196 ns |  1.00 |    0.00 | 193.3594 | 193.3594 | 32.2266 |  164054 B |        1.00 |
| DistinctWithoutAny | .NET 9.0 | .NET 9.0 | 44,453.924 ns | 588.1875 ns |   521.4129 ns |  0.84 |    0.01 |  38.4521 |  38.4521 | 38.4521 |  164058 B |        1.00 |
|                    |          |          |               |             |               |       |         |          |          |         |           |             |
| WhereWithoutAny    | .NET 8.0 | .NET 8.0 | 38,072.681 ns | 736.6029 ns |   957.7919 ns |  1.00 |    0.00 |   7.8125 |   0.1831 |  0.0610 |   65840 B |        1.00 |
| WhereWithoutAny    | .NET 9.0 | .NET 9.0 | 15,785.917 ns | 381.5808 ns | 1,119.1103 ns |  0.43 |    0.03 |   7.8430 |   0.2289 |       - |   65840 B |        1.00 |
|                    |          |          |               |             |               |       |         |          |          |         |           |             |
| HasAnyDistinct     | .NET 8.0 | .NET 8.0 |     43.804 ns |   0.1348 ns |     0.1261 ns |  1.00 |    0.00 |   0.0401 |        - |       - |     336 B |        1.00 |
| HasAnyDistinct     | .NET 9.0 | .NET 9.0 |      5.190 ns |   0.0045 ns |     0.0037 ns |  0.12 |    0.00 |        - |        - |       - |         - |        0.00 |

Where itself is already twice as fast in net9.0. The Any method is now 8.5x faster than in .NET 8.0. To understand what is going on, let's have a look at TryGetNonEnumeratedCount. If you need a refresher: "How does TryGetNonEnumeratedCount work?". Basically Any tries to call TryGetNonEnumeratedCount and if it succeeds, it returns whether or not the count is 0. This is a very fast operation. If it can't get the count cheap, it has to start enumerating. In case of Any that means start the enumeration and try to get the first element - if there is one, it returns true, otherwise false.

In .NET 9 many of the internal types derive from an Iterator<T> class which offers a TryGetFirst. So if it can't get the enumeration count cheap, it tries to get the first element cheap. Like TryGetNonEnumeratedCount TryGetFirst is a specialized call on top of iterator types (like Where and Distinct).

You want to know more? Check out the PR: https://github.com/dotnet/runtime/pull/99218

11
An error has occurred. This application may no longer respond until reloaded. Reload x