Entity Framework has two methods for converting a query to a list or an array: ToListAsync
and ToArrayAsync
. Of course there is also the sync version of these methods: ToList
and ToArray
. And sometimes, you should use the latter one!
The bug
In one of my projects, I have the following code:
public static async Task<IPagedList<T>> ToPagedListAsync<T>(this IQueryable<T> source, int page, int pageSize, CancellationToken token = default)
{
var count = await source.CountAsync(token);
if (count > 0)
{
var items = await source
.Skip((page - 1) * pageSize)
.Take(pageSize)
.ToListAsync(token);
return new PagedList<T>(items, count, page, pageSize);
}
return PagedList<T>.Empty;
}
That looks fairly trivial - and of course we want ToListAsync
here, or do we? Well: Not necessarily! To give a bigger picture, this call is used in that very blog post you are reading right now. And the datamodel of a blog post has the following column: Content NVARCHAR(MAX) NOT NULL
. NVARCHAR(MAX)
(as well as VARBINARY(MAX)
and friends) cause a known issue with Entity Framework since almost 10 years: Performance degradation. The async
version spawns a whole lot of tasks, see the stackoverflow article.
If you want to follow this in more depth: Reading large data (binary, text) asynchronously is extremely slow .
The fix
The fix is fairly simple: Just use the synchronous version:
public static async Task<IPagedList<T>> ToPagedListAsync<T>(this IQueryable<T> source, int page, int pageSize, CancellationToken token = default)
{
var count = await source.CountAsync(token);
if (count > 0)
{
// I tried ToListAsync and it performed just poorly!
// Mainly because we have a VARCHAR(max) column
// See here: https://stackoverflow.com/questions/28543293/entity-framework-async-operation-takes-ten-times-as-long-to-complete/28619983
var items = source
.Skip((page - 1) * pageSize)
.Take(pageSize)
.ToList();
return new PagedList<T>(items, count, page, pageSize);
}
return PagedList<T>.Empty;
}
And yes, adding a comment here is absolutely necessary and okay! Especially because CountAsync
is still async and the whole method is marked as async
.
Update 10th November
I made a small sample to demonstrate this behavior. And as you can see, this only happens if you really have large binary columns and many rows. In the given example I retrieve 40000 records where each record has a string of length 40000:
using System.Diagnostics;
using Microsoft.EntityFrameworkCore;
await CreateSeedDataAsync();
Console.WriteLine("Done Writing");
await using (var context = new SomeDataContext())
{
var sw = Stopwatch.StartNew();
var syncData = context.Data.Select(s => s.Content).ToList();
sw.Stop();
Console.WriteLine($"Sync read time: {sw.ElapsedMilliseconds}ms");
Console.WriteLine($"Retrieved {syncData.Count} records");
}
// Read data asynchronously
await using (var context = new SomeDataContext())
{
var sw = Stopwatch.StartNew();
var asyncData = await context.Data.Select(s => s.Content).ToListAsync();
sw.Stop();
Console.WriteLine($"Async read time: {sw.ElapsedMilliseconds}ms");
Console.WriteLine($"Retrieved {asyncData.Count} records");
}
// Read data synchronously
async Task CreateSeedDataAsync()
{
await using var context = new SomeDataContext();
await context.Database.OpenConnectionAsync();
await context.Database.EnsureCreatedAsync();
var data = Enumerable.Range(1, 40000).Select(i => new SomeData
{
Content = new string('X', 40000),
}).ToList();
await context.Data.AddRangeAsync(data);
await context.SaveChangesAsync();
}
public class SomeData
{
public int Id { get; set; }
public required string Content { get; set; }
}
public sealed class SomeDataContext : DbContext
{
public DbSet<SomeData> Data { get; set; }
public SomeDataContext()
{
Database.EnsureCreated();
}
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder.UseSqlite("DataSource=file::memory:?cache=shared");
}
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<SomeData>().HasKey(x => x.Id);
modelBuilder.Entity<SomeData>()
.Property(x => x.Id)
.ValueGeneratedOnAdd();
modelBuilder.Entity<SomeData>()
.Property(x => x.Content)
.IsRequired();
}
}
The result?
Sync read time: 2497ms
Retrieved 40000 records
Async read time: 3215ms
Retrieved 40000 records
So while there is a difference of roughly 700ms for this example - you have to decide for yourself if this is a lot (because with the sync version there is no cancellation support anymore).
But what happens if we upgrade to Entity Framework 9.0.0-rc.2
? Well the results will look like this:
Sync read time: 1750ms
Retrieved 40000 records
Async read time: 2223ms
Retrieved 40000 records
There is just so much of an overhaul improvement! It is incredible! But still the async
version is still slower. Thanks to @ryanmendoza for the discussion on BlueSkyThis link will lead you to my page where you can follow me.
What is the takeaway?
Well, check if you are handling large binary data via ToListAsync
in your selector? If so, maybe you are suffering from the issue!