LINQ: Select.Where or Where.Select?

29/06/2023
LINQ.NETC#

LINQ is a very powerful tool for querying data. As the majority of functions are built on top of IEnumerable<T> and it in most cases returns IEnumerable<T> as well, it is very easy to chain multiple functions together. That leaves you with a question: which one should I use, Select.Where or Where.Select?

Does the order matter?

Before we go into detail, I will only talk about the "in memory" case, so I leave things like IQueryable<T> out of the picture. In this case, the order of the functions depends mainly on the underlying provider. So things like Entity Framework or more general LINQ to SQL will have different results.

Does the order matter?. Let's look at the following example of code:

var myList = GetAllUsers();

// Variation 1
myList.Where(u => u.IsStudent)
      .Select(u => new { u.FirstName, u.LastName, u.IsStudent });

// Variation 2
myList.Select(u => new { u.FirstName, u.LastName, u.IsStudent })
      .Where(u => u.IsStudent);

The result of both queries will be the same. But that doesn't necessarily mean that they do the same to come to the same picture and indeed there is a difference. The first variation will first filter out all the users that are not students and then create the anonymous type. The second variation will first create the anonymous type for all users and then filter out the ones that are not students. So which query do you think is faster?

The answer is: Most probably the first variation. The reason for this is that the second variation will create an anonymous type for all users, even the ones that are not students. This means that the second variation will have to do more work than the first variation. Creating or newing up new objects is more expensive than filtering out objects (as a general rule of thumb). For small lists that difference is negligible, but for large lists, it can make a difference. As a general rule of thumb, you should always try to filter out as much as possible before you start creating new objects.

Here is a small visualization of the fact: img

And Select is just an example. You could imagine the same if you have operators like OrderBy. If the list isn't filtered before, OrderBy has to check way more entries than it would if the list is filtered in the first place. So whenever possible, I would put my filter aka Where at the beginning of your LINQ query. Resulting in a clean way to read your queries, as they are always structured the same way as well as having better performance (less allocations as you don't throw away lots of elements). Again for smaller lists not as important, but this "performance tip" of ordering your LINQ queries doesn't come with a penalty of maintainability or readability. On the contrary, for me it improves that.

Does the order matter? (part 2)

But wait, there is more: Let's have a look at the following two versions where we call a merged Where and multiple Where statements:

var myList = GetAllUsers();

// Variation 1
myList.Where(u => u.IsStudent && u.Age > 30);

// Variation 2
myList.Where(u => u.IsStudent)
      .Where(u => u.Age > 30);

Again semantically, they do absolutely the same. The result will be the same in both cases, but the way this is achieved again is different. The first variation basically goes through our complete list once and checks for the two conditions. The second variation will go through the list twice. First, it will filter out all the users that are not students, and then it will filter out all the users that are younger than 30. For smaller lists, this is not a big deal, but for larger lists, this can make a difference.


Update 04/07/2023: Initially, I wrote that the list gets filtered twice - that is not the case (see comment down below from @sandman633): You can see in the example in the comments if you run the example, that the Console.WriteLine is not called twice for the Where clause which should be the case is. There is still some cost associated with calling a function multiple times - but that is somewhat neglectable in the grand scheme of things.

For full transparency, I will leave my original false statement of mine.


img

If you have multiple Where clauses, it still makes sense to have the greatest filter at the beginning - or even better, refactor all the filters into one method in a meaningful way. With this approach, you can come around the fact that you may have a lot of Where clauses or that your big Where clause is not readable anymore. In our example:

var myList = GetAllUsers();

myList.Where(IsStudentAndOlderThan30);

private bool IsStudentAndOlderThan30(User user)
{
    return user.IsStudent && user.Age > 30;
}

A small and somewhat extrem benchmark

A small disclaimer as usual: Benchmarks don't mean anything if not set in the right context. Measure and profile your situation first and then act accordingly. Nevertheless the stuff I show you here is interesting and I think it is worth sharing.

Here the following tests:

public class Benchmarks
{
    private readonly int[] _numbers = Enumerable.Range(0, 100_000).ToArray();

    [Benchmark(Baseline = true)]
    public int[] OrderWhereWhere() =>
        _numbers
            .OrderDescending()
            .Where(n => n % 2 == 0)
            .Where(n => n % 3 == 0)
            .ToArray();

    [Benchmark]
    public int[] WhereOrder() =>
        _numbers
            .Where(n => n % 2 == 0 && n % 3 == 0)
            .OrderDescending()
            .ToArray();
}

Basically we sort the list descending and then filter out all the numbers that are dividable by 2 and 3. The second test does the same, but first filters out all the numbers that are dividable by 2 and 3 in one go and then sorts the list descending. The result set is the same, but the times vary:

|          Method |      Mean |     Error |    StdDev | Ratio |
|---------------- |----------:|----------:|----------:|------:|
| OrderWhereWhere | 10.133 ms | 0.0684 ms | 0.0640 ms |  1.00 |
|      WhereOrder |  1.491 ms | 0.0093 ms | 0.0087 ms |  0.15 |

The second variation is almost 7 times faster than the first variation.

Update 05/07/2023

A small update: @jamescurran added a nice comment down below, you should check out explaining what is going on. He also did provide an example. You can either fiddle around on Sharplab or just see it in action here:

List<int> arr = new() { 1, 5, 23, 7, 6, 19, 20, 9, 8 };
var arr2 = arr
.Where(x =>
{
    Console.WriteLine($"Is {x} > then 5");
    return x > 5;
})
.Where(x =>
{
    Console.WriteLine($"Is {x} > then 10");
    return x > 10;
})
.Select(x =>
{
    Console.WriteLine($"Making new object with {x}");
    return new { val1 = x };
}).ToList();

Also credits for that code goes to @jamescurran

Produces the following result:

Is 1 > then 5
Is 5 > then 5
Is 23 > then 5
Is 23 > then 10
Making new object with 23
Is 7 > then 5
Is 7 > then 10
Is 6 > then 5
Is 6 > then 10
Is 19 > then 5
Is 19 > then 10
Making new object with 19
Is 20 > then 5
Is 20 > then 10
Making new object with 20
Is 9 > then 5
Is 9 > then 10
Is 8 > then 5
Is 8 > then 10

Again for a detailed explanation, just head down to the comment section!

Conclusion

I hope I could give you a small insight into the world of LINQ and how the order of the functions can make a difference. As a general rule of thumb, you should always try to filter out as much as possible before you start creating new objects.

14
An error has occurred. This application may no longer respond until reloaded. Reload x