Be careful with ToListAsync and ToArrayAsync in Entity Framework Core

11/4/2024
4 minute read

Entity Framework has two methods for converting a query to a list or an array: ToListAsync and ToArrayAsync. Of course there is also the sync version of these methods: ToList and ToArray. And sometimes, you should use the latter one!

The bug

In one of my projects, I have the following code:

public static async Task<IPagedList<T>> ToPagedListAsync<T>(this IQueryable<T> source, int page, int pageSize, CancellationToken token = default)
{
    var count = await source.CountAsync(token);
    if (count > 0)
    {
        var items = await source
            .Skip((page - 1) * pageSize)
            .Take(pageSize)
            .ToListAsync(token);
        return new PagedList<T>(items, count, page, pageSize);
    }

    return PagedList<T>.Empty;
}

That looks fairly trivial - and of course we want ToListAsync here, or do we? Well: Not necessarily! To give a bigger picture, this call is used in that very blog post you are reading right now. And the datamodel of a blog post has the following column: Content NVARCHAR(MAX) NOT NULL. NVARCHAR(MAX) (as well as VARBINARY(MAX) and friends) cause a known issue with Entity Framework since almost 10 years: Performance degradation. The async version spawns a whole lot of tasks, see the stackoverflow article.

If you want to follow this in more depth: Reading large data (binary, text) asynchronously is extremely slow .

The fix

The fix is fairly simple: Just use the synchronous version:

public static async Task<IPagedList<T>> ToPagedListAsync<T>(this IQueryable<T> source, int page, int pageSize, CancellationToken token = default)
{
    var count = await source.CountAsync(token);
    if (count > 0)
    {
        // I tried ToListAsync and it performed just poorly!
        // Mainly because we have a VARCHAR(max) column
        // See here: https://stackoverflow.com/questions/28543293/entity-framework-async-operation-takes-ten-times-as-long-to-complete/28619983
        var items = source
            .Skip((page - 1) * pageSize)
            .Take(pageSize)
            .ToList();
        return new PagedList<T>(items, count, page, pageSize);
    }

    return PagedList<T>.Empty;
}

And yes, adding a comment here is absolutely necessary and okay! Especially because CountAsync is still async and the whole method is marked as async.

Update 10th November

I made a small sample to demonstrate this behavior. And as you can see, this only happens if you really have large binary columns and many rows. In the given example I retrieve 40000 records where each record has a string of length 40000:

using System.Diagnostics;
using Microsoft.EntityFrameworkCore;

await CreateSeedDataAsync();

Console.WriteLine("Done Writing");

await using (var context = new SomeDataContext())
{
    var sw = Stopwatch.StartNew();
    var syncData = context.Data.Select(s => s.Content).ToList();
    sw.Stop();
    Console.WriteLine($"Sync read time: {sw.ElapsedMilliseconds}ms");
    Console.WriteLine($"Retrieved {syncData.Count} records");
}

// Read data asynchronously
await using (var context = new SomeDataContext())
{
    var sw = Stopwatch.StartNew();
    var asyncData = await context.Data.Select(s => s.Content).ToListAsync();
    sw.Stop();
    Console.WriteLine($"Async read time: {sw.ElapsedMilliseconds}ms");
    Console.WriteLine($"Retrieved {asyncData.Count} records");
}

// Read data synchronously
async Task CreateSeedDataAsync()
{
    await using var context = new SomeDataContext();
    await context.Database.OpenConnectionAsync();
    await context.Database.EnsureCreatedAsync();

    var data = Enumerable.Range(1, 40000).Select(i => new SomeData
    {
        Content = new string('X', 40000),
    }).ToList();
    
    await context.Data.AddRangeAsync(data);
    await context.SaveChangesAsync();
}

public class SomeData
{
    public int Id { get; set; }
    public required string Content { get; set; }
}

public sealed class SomeDataContext : DbContext
{
    public DbSet<SomeData> Data { get; set; }

    public SomeDataContext()
    {
        Database.EnsureCreated();
    }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        optionsBuilder.UseSqlite("DataSource=file::memory:?cache=shared");
    }
    
    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<SomeData>().HasKey(x => x.Id);
        modelBuilder.Entity<SomeData>()
            .Property(x => x.Id)
            .ValueGeneratedOnAdd();
        
        modelBuilder.Entity<SomeData>()
            .Property(x => x.Content)
            .IsRequired();
    }
}

The result?

Sync read time: 2497ms
Retrieved 40000 records
Async read time: 3215ms
Retrieved 40000 records

So while there is a difference of roughly 700ms for this example - you have to decide for yourself if this is a lot (because with the sync version there is no cancellation support anymore).

But what happens if we upgrade to Entity Framework 9.0.0-rc.2? Well the results will look like this:

Sync read time: 1750ms
Retrieved 40000 records
Async read time: 2223ms
Retrieved 40000 records

There is just so much of an overhaul improvement! It is incredible! But still the async version is still slower. Thanks to @ryanmendoza for the discussion on BlueSkyThis link will lead you to my page where you can follow me.

What is the takeaway?

Well, check if you are handling large binary data via ToListAsync in your selector? If so, maybe you are suffering from the issue!

ToArray(Async) vs ToList(Async) in Entity Framework 8

When retrieving data from your database with Entity Framework, there are two major options: ToArray and ToList. Besides the different return type, is there any significant difference in performance between the two? Let's find out!

Entity Framework - Storing complex objects as JSON

From time to time, it is nice to store complex objects or lists as JSON in the database. With Entity Framework 8, this is now easily possible. But this was possible all along with Entity Framework 7.

Entity Framework 8: Raw SQL queries on unmapped types

The next iteration of Entity Framework, namely Entity Framework 8, will have a new and exciting feature:

Support raw SQL queries without defining an entity type for the result

That means less boilerplate code!

An error has occurred. This application may no longer respond until reloaded. Reload x