EF7 - Bulk updates and Bulk deletes

8/21/2022
6 minute read

Finally the long awaited bulk update and bulk delete feature is heading towards Entity Framework 7. So we will have a look how to use it and what might be some limitations.

Also I will have a quick look at the performance aspect, because this was the main driver for that feature.


Info: The feature I will show is currently not in the 7.0.0-preview7 package. So I took the nightly builds (at this point in time: 7.0.0-rc.2). As always I will share the code at the end of the blogpost. You can see that I use a modified NuGet.config to get the latest nightly builds of the efcore team.


The problem

If you are using Entity Framework 6 or below and you want to update multiple entries of a collection, you are stuck with somehow this or a similiar solution:

var blogPosts = myDbContext.BlogPosts.Where(s => s.SomeProperty == 2);
foreach (var blogPost in blogPosts)
{
    blogPost.AnotherProperty = 3;
}

await myDbContext.BlogPosts.SaveChangesAsync();

How does that look for deleting? The simplest case:

myDbContext.BlogPosts.Clear();
await myDbContext.SaveChangesAsync();

The problem is always the same. The translated SQL operates on each row plus in the update case: You have to get all the stuff in memory first. Just imagine you have roughly 50k entries. It is not a great to go 50k times to the database to delete or update a single row. The workaround would involve more or less raw sql. Raw SQL has the downside that it is not necessarily refactoring friendly (sure you can work with nameof but if you have a mismatch between property name and column name, that will not help you). That is why we will have from now on ExecuteDeleteAsync and ExecuteUpdateAsync (and of course their synchronous counterpart).

The trivial use case

The cool thing is that you can use those operations either directly on your DbContext or on any IQueryable:

var context = new MyDbContext();
// Deletes all entry in one transaction
var deletedColumns = await context
    .People
    .ExecuteDeleteAsync();

// Deletes all people, which are under 30 in one transaction
var deletedColumns =  await context
    .People
    .Where(p => p.Age < 30).ExecuteDeleteAsync();

// We can do the same with update
// Update the age to 30 of everyone
var updatedColumns = await context
    .People
    .ExecuteUpdateAsync(p => p.SetProperty(s => s.Age, age => 30));

What is very important to see here, I don't use any SaveChanges or SaveChangesAsync. And that is something very important. The operations are getting executed on the spot. If you mix those operations with SaveChanges then you can have a bad time because for a reader, who is not aware of that, the order can be different than what really happens at runtime. So this approach does not work well with the Unit of Work pattern. There are some other limitations like not being able to use anonymous types for the update case. Remember you can use join beforehands making it impossible for EF to derive what should happen next. If you want to know more about this, I would recommend checking the issue tracker on GitHub.

Performace

Let's have a quick look at performance. Let's begin with the update case, what happens if we want to update ALL entries of a collection. Our baseline is the old approach. For that we will use a InMemory Sqlite database to minimize side effects.

Setup:

[MemoryDiagnoser]
public class UpdateBenchmark
{
    private PeopleContext _peopleContext;

    [Params(100, 1_000)] public int RowsToUpdate { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _peopleContext = CreateContext();
        CreateContext();
        FillWithData();
    }

    [Benchmark(Baseline = true)]
    public async Task OneByOneUpdate()
    {
        var entries = await _peopleContext.People.ToListAsync();
        foreach (var entry in entries)
        {
            entry.Age = 21;
        }

        await _peopleContext.SaveChangesAsync();
    }

    [Benchmark]
    public async Task BulkUpdate()
    {
        await _peopleContext.People
            .ExecuteUpdateAsync(s => s.SetProperty(p => p.Age, p => 21));
    }

    private void FillWithData()
    {
        using var context = CreateContext();
        context.Database.EnsureCreated();
        for (var i = 1; i <= RowsToUpdate; i++)
            context.People.Add(new Person(i, "Steven", 31));

        context.SaveChanges();
    }

    private static PeopleContext CreateContext()
    {
        var connection = new SqliteConnection("DataSource=myshareddb;mode=memory;cache=shared");
        connection.Open();
        var options = new DbContextOptionsBuilder()
            .UseSqlite(connection)
            .Options;
        return new PeopleContext(options);
    }
}

The results are in:

|         Method | RowsToUpdate |      Mean |    Error |   StdDev | Ratio |    Gen 0 |  Gen 1 | Allocated |
|--------------- |------------- |----------:|---------:|---------:|------:|---------:|-------:|----------:|
| OneByOneUpdate |          100 |  83.71 us | 0.441 us | 0.412 us |  1.00 |  14.1602 | 0.1221 |     87 KB |
|     BulkUpdate |          100 |  20.70 us | 0.101 us | 0.095 us |  0.25 |   1.3123 |      - |      8 KB |
|                |              |           |          |          |       |          |        |           |
| OneByOneUpdate |         1000 | 753.26 us | 1.122 us | 0.937 us |  1.00 | 136.7188 |      - |    840 KB |
|     BulkUpdate |         1000 |  97.64 us | 1.761 us | 1.647 us |  0.13 |   1.2207 |      - |      8 KB |

Keep in mind that I use a in-memory db. The overhead on a real database is way more in favour of the bulk update. So this is almost the worst case for our BulkUpdate mechanism and still it is way way way better.

Conclusion

Both new additions to the Entity Framework family are more than welcome. They can drastically increase your performance and scalability. If you used the workaround with raw sql until now, you can also omit this with this new approach! Great job!

Resources

  • The source code for the benchmark can be found here.
  • Majority of code samples can be found here.

Entity Framework and ordered indexes

In Entity Framework 7, the team has added support for ordered indexes to the fluent API. In this blog post we will look at how to use this feature and what it means for your database.

Delete a record without prior loading in Entity Framework

Sometimes you have an Id of an object and want to delete the underlying thing from the database. But it doesn't make sense to load the whole object from the database to memory first. So how can we achieve this quickly?

Entity Framework 8: Raw SQL queries on unmapped types

The next iteration of Entity Framework, namely Entity Framework 8, will have a new and exciting feature:

Support raw SQL queries without defining an entity type for the result

That means less boilerplate code!

An error has occurred. This application may no longer respond until reloaded. Reload x