As with almost every edition of .NET, the team has been working on improving performance. In this blog post, we will see some improvements to the related tickets and benchmarks.
Disclaimer: I am running those benchmarks on a MacBook M2 Pro (arm64) with MacOS 14.4. That said, run those tests always against your real target environment. As seen in this issue: https://github.com/dotnet/runtime/issues/101437, the performance can vary a lot between different environments and platforms.
1. OrderBy.ToList
enhancements
Let's have a look at the Benchmark:
private static readonly Random Random = new(42);
private static readonly List<int> _list = Enumerable.Range(0, 10_000).Select(_ => Random.Next(1000)).ToList();
[Benchmark]
public List<int> OrderBy() => _list.OrderBy(s => s).ToList();
We create a random list (but stable and deterministic overmultiple runs) and call OrderBy
followed by ToList
. The benchmark is run with 10,000 elements.
Results:
| Method | Job | Runtime | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio |
|----------- |--------- |--------- |----------:|----------:|----------:|------:|--------:|--------:|-------:|----------:|------------:|
| OrderBy | .NET 8.0 | .NET 8.0 | 779.89 us | 7.987 us | 7.471 us | 1.00 | 0.00 | 18.5547 | 0.9766 | 156.55 KB | 1.00 |
| OrderBy | .NET 9.0 | .NET 9.0 | 747.02 us | 11.413 us | 10.675 us | 0.96 | 0.02 | 18.5547 | - | 156.58 KB | 1.00 |
Why is this faster with net9? Let's have a look at the diff:
-for (int i = 0; i < map.Length; i++)
-{
- map[i] = i;
-}
+int[] map = new int[count];
+FillIncrementing(map, 0);
Source: https://github.com/dotnet/runtime/pull/99538
This method is called when you call ToList
on an OrderBy
result. The important part here is that FillIncrementing
utilizes Vector
s if the hardware is capable off.
Vector
is used for SIMD - Single Instruction, Multiple Data. This means that the CPU can execute the same instruction on multiple data points at the same time. This is especially useful for operations like FillIncrementing
where we just fill an array with incrementing values. So instead of writing one value at a time, we can write multiple values at once saving a lot of time.
If you are interested in SIMD and how it works, I recommend my blog post: "LINQ on Steroids"
2. Chunk
improvements for Array<T>
The next on is a specialization for Array<T>
in the Chunk
method. Let's have a look at the benchmark:
private static readonly int[] _array = Enumerable.Range(0, 10_000).Select(_ => Random.Next(1000)).ToArray();
[Benchmark]
public int Chunk() => _array.Chunk(10).Count();
Results:
| Method | Job | Runtime | Mean | Error | StdDev | Ratio | Gen0 | Allocated | Alloc Ratio |
|------- |--------- |--------- |----------:|----------:|----------:|------:|-------:|----------:|------------:|
| Chunk | .NET 8.0 | .NET 8.0 | 26.685 us | 0.1933 us | 0.1713 us | 1.00 | 7.6599 | 62.7 KB | 1.00 |
| Chunk | .NET 9.0 | .NET 9.0 | 7.319 us | 0.0595 us | 0.0556 us | 0.27 | 7.6523 | 62.56 KB | 1.00 |
That is impressive! The Chunk
method is now 3.6x faster than in .NET 8.0. By the way - that does not translate to List<T>
. The Chunk
method is specialized for Array<T>
only (as of preivew 3 of .net9).
The performance improvement stems from ReadOnlySpan
. Have a look here: https://source.dot.net/#System.Linq/System/Linq/Chunk.cs,61 vs https://source.dot.net/#System.Linq/System/Linq/Chunk.cs,72
The latter one is used for all the other types (if not empty). Since the introduction of Span
and ReadOnlySpan
the team is using it more and more - for a good reason: It can improve the performance a lot. But be warned as there are also many caveats. Here a small overview if you need a refresher: Create a low allocation and faster StringBuilder - Span in Action.
The PR in question can be found here: https://github.com/dotnet/runtime/pull/99538
3. OfType
and Cast
improvements
The last benchmark is about OfType
and Cast
. Let's have a look at the benchmark:
private static readonly Random Random = new(42);
private readonly List<object> _list = [];
[GlobalSetup]
public void Setup()
{
for (var i = 0; i < 10_000; i++)
{
var n = Random.Next(1000) % 4;
object objectToAdd = n switch
{
0 => 2,
1 => "Hello World",
2 => 3f,
3 => 100d
};
_list.Add(objectToAdd);
}
}
[Benchmark]
public List<int> GetAllInts() => _list.OfType<int>().ToList();
Basically we are creating a random set of different types and then filter for int
values. The benchmark is run with 10,000 elements.
| Method | Job | Runtime | Mean | Error | StdDev | Ratio | Gen0 | Allocated | Alloc Ratio |
|----------- |--------- |--------- |---------:|---------:|---------:|------:|-------:|----------:|------------:|
| GetAllInts | .NET 8.0 | .NET 8.0 | 63.92 us | 0.491 us | 0.459 us | 1.00 | 3.9063 | 32.37 KB | 1.00 |
| GetAllInts | .NET 9.0 | .NET 9.0 | 49.28 us | 0.408 us | 0.362 us | 0.77 | 3.9063 | 32.36 KB | 1.00 |
A speedy iprovement of 1.3x. The gains are coming from the point that the dotnet team special treat certain types. Why is this important or better: What can you learn from this? Well - if you have a critical path in your code that needs to be faster, you can specialize your code for certain types or scenarios. And that is what happened here. The problem with such optimizations is that they involve (way) more code and therefore can become less readable and maintainable. So it is always a trade-off between performance and readability. Therefore don't blindly optimize your code. Always measure first and then optimize.
The PR in question: https://github.com/dotnet/runtime/pull/97956
4. Any
improvements
The Any
method is used to check if a sequence contains any elements. This one gets an improvement:
private static readonly Random Random = new(42);
private static readonly List<int> _list = Enumerable.Range(0, 10_000).Select(s => Random.Next(100)).ToList();
private static readonly IEnumerable<int> _distinct = _list.Distinct();
private static readonly IEnumerable<int> _where = _list.Where(s => s % 2 == 0);
[Benchmark]
public List<int> DistinctWithoutAny() => _distinct.ToList();
[Benchmark]
public List<int> WhereWithoutAny() => _where.ToList();
[Benchmark]
public bool HasAnyDistinct() => _distinct.Any();
[Benchmark]
public bool HasAnyWhere() => _where.Any();
I added the DistinctWithoutAny
and WhereWithoutAny
to show that Distinct
and Where
are already heavily optimized in .NET 9.
| Method | Job | Runtime | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
|------------------- |--------- |--------- |--------------:|------------:|--------------:|------:|--------:|---------:|---------:|--------:|----------:|------------:|
| DistinctWithoutAny | .NET 8.0 | .NET 8.0 | 52,996.767 ns | 490.1662 ns | 434.5196 ns | 1.00 | 0.00 | 193.3594 | 193.3594 | 32.2266 | 164054 B | 1.00 |
| DistinctWithoutAny | .NET 9.0 | .NET 9.0 | 44,453.924 ns | 588.1875 ns | 521.4129 ns | 0.84 | 0.01 | 38.4521 | 38.4521 | 38.4521 | 164058 B | 1.00 |
| | | | | | | | | | | | | |
| WhereWithoutAny | .NET 8.0 | .NET 8.0 | 38,072.681 ns | 736.6029 ns | 957.7919 ns | 1.00 | 0.00 | 7.8125 | 0.1831 | 0.0610 | 65840 B | 1.00 |
| WhereWithoutAny | .NET 9.0 | .NET 9.0 | 15,785.917 ns | 381.5808 ns | 1,119.1103 ns | 0.43 | 0.03 | 7.8430 | 0.2289 | - | 65840 B | 1.00 |
| | | | | | | | | | | | | |
| HasAnyDistinct | .NET 8.0 | .NET 8.0 | 43.804 ns | 0.1348 ns | 0.1261 ns | 1.00 | 0.00 | 0.0401 | - | - | 336 B | 1.00 |
| HasAnyDistinct | .NET 9.0 | .NET 9.0 | 5.190 ns | 0.0045 ns | 0.0037 ns | 0.12 | 0.00 | - | - | - | - | 0.00 |
Where itself is already twice as fast in net9.0
. The Any
method is now 8.5x faster than in .NET 8.0.
To understand what is going on, let's have a look at TryGetNonEnumeratedCount
. If you need a refresher: "How does TryGetNonEnumeratedCount work?". Basically Any
tries to call TryGetNonEnumeratedCount
and if it succeeds, it returns whether or not the count is 0. This is a very fast operation. If it can't get the count cheap, it has to start enumerating. In case of Any
that means start the enumeration and try to get the first element - if there is one, it returns true
, otherwise false
.
In .NET 9 many of the internal types derive from an Iterator<T>
class which offers a TryGetFirst
. So if it can't get the enumeration count cheap, it tries to get the first element cheap.
Like TryGetNonEnumeratedCount
TryGetFirst
is a specialized call on top of iterator types (like Where
and Distinct
).
You want to know more? Check out the PR: https://github.com/dotnet/runtime/pull/99218