In this blog post I will discuss the pros and cons of soft deleting records in a database versus other approaches you can take. Heat up your pop corn and get your salt ready!
Soft Delete
Soft delete is a technique used to mark a record as deleted without actually deleting it. In many cases, you will something like a IsDeleted
column in your table that will be set to true
when the record is deleted. This way, the record is not actually deleted from the database, but it is marked as deleted.
public class User
{
public int Id { get; set; }
public string Name { get; set; }
public bool IsDeleted { get; set; }
}
As you might imagine in many cases you don't care about "deleted" records in your Web API so often times people will filter out the deleted records likes this:
public async Task<IReadOnlyCollection<User>> GetUsers()
{
return await _context.Users.Where(u => !u.IsDeleted).ToArrayAsync();
}
Now I choose Entity Framework deliberately because it is a popular ORM and it is easy to work with. As you need to add this Where(u => !u.IsDeleted)
to every query you write, it can be a bit cumbersome, so folks resort to Global Query Filters in Entity Framework. This way you can apply the filter globally and you don't have to worry about it in your queries.
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<User>().HasQueryFilter(u => !u.IsDeleted);
}
So everytimt you retrieve something from the User
table, it will automatically filter out the deleted records. Of course you can override that filter if you want to:
public async Task<IReadOnlyCollection<User>> GetUsersWithDeleted()
{
return await _context.Users.IgnoreQueryFilters().ToArrayAsync();
}
Popcorn, Grains of Salt and my personal opinion
Now that we have seen how soft delete works, let's give you my opinion. For starters, do I like soft deletes? Often times: No. For many reasons. Let's start why and let's check some alternatives.
Do we need deleted records?
In many cases, you don't need the deleted records. If you don't need them, why keep them in the database? They are just taking up space and making your queries slower. What's not needed, can be deleted. Of course, if you have to keep them, then we have to find different ways of doing so! Often times reporting might want to see deleted entries, but how often are reports generated? Are they part of your daily business? Even if, see "Alternatives" below.
Global Query Filters
Global Query Filters are a blessing and a curse. If you see a specific query, this query will not tell you the whole truth! You have to know that there is a global query filter applied to the query. This can be confusing and can lead to bugs.
Performance
Soft deletes can have a performance impact. If you have a lot of records in your table, the query will have to filter out the deleted records every time you query the table. You might counter this with additional indexes, but this can also have a negative impact on your insert and update performance.
Alternatives
Archive Table or schema
One alternative is to move the deleted records to an archive table or schema. This way you can keep the deleted records, but they are not in the same table as the active records. This way you can keep your active records table clean and fast. You can also create a view that combines the active records with the archived records if you need to see them together in specific circumstances.
Archive Database
Basically like the last approach, but you move the deleted records to a different database. This way you can keep your active records database clean and fast. This can be a bit more cumbersome to set up, but it can be a good solution if you have a lot of deleted records. Especially if you have lots of data, this can be a money saver if your using managed instances on Azure or AWS.
For example Azure offers you an Archive Tier, where even terabytes of data cost way way less than an active SQL database (given that you don't read much data and don't need your archive in seconds available!).
Conclusion
Soft deletes can be a good solution in some cases, but in many cases they are not my preffered way of handling data. Especially if there is no need and/or you are handling a lot of data, you might want to consider other approaches.