Divide and Conquer the Database

9/1/2025
7 minute read

For a lack of a better title, this blog post shows a database improvement that helped me quite a bit and what to do to make it work.

The Problem

For a project we want to visualize many many data points (which would correspond to over 25k rows in the database). The problem here is not only the size, but also the fact that the data itself is BLOB. So the database might have to store the data out of row, which is not very efficient (more I/O lookups).

For us, the user comes with a date range and some other parameters (that are not so important here) and gets all the datapoints for a 3D chart:

public async Task<ThreeDData> Get3DDate(
    Filters filters,
    CancellationToken cancellationToken
)
{
    return await _repository.Get3DDate(
        filters,
        cancellationToken
    );
}

With simplified:

public sealed class Filters
{
    public DateTime StartDate { get; set; }
    public DateTime EndDate { get; set; }
}

Now if we are asking for roughly 3000 rows (which translates to a certain start and end date), the database might need 1 second. Already taking 5000 rows (even though only an increase of 66%) would take 2 seconds. And if we ask for 25k rows, it would take 10 seconds or more. So it scales superlinear.

Now I should put a disclaimer here: I am not a database expert and yes putting binary large data inside a database is not the best idea. But it is what it is and I just wanted to share my journey and people can correct me with all my mistakes. 😄

## Some solutions

Of course we had indexes and so on for the retrieval, so I will here just highlight some other things, that I hope is worth sharing:

Compress and Decompress

Instead of having the BLOB data in the database as is, we compressed it before storing it. This reduced the size of the data significantly, which in turn reduced the I/O operations needed to retrieve it. The target was that a row is below 8060 bytes. Why? Well we are using SQL Server and it has a limit of 8060 bytes for a row. Everythign above and the data is stored out of row. That helps a lot when working with BLOBs.

Divide and Conquer

The second technique I was using for that specific case: Divide and Conquer the data. Instead of making one request that retrieves everything, I split the request into smaller chunks. In particual: Every day in my request is a separate request. That works in my particular case. As folks might ask for a week or more, I did not want to overload the database, so I also limited to 4 requests at the time:

public class DateChunkingService
{
    private readonly SemaphoreSlim _semaphore = new(4, 4);

    /// <summary>
    /// Executes a function for each daily chunk of the date range with controlled concurrency
    /// </summary>
    /// <typeparam name="TResult">The result type</typeparam>
    /// <param name="filterParameters">The filter parameters containing StartDate and EndDate</param>
    /// <param name="func">The function to execute for each chunk</param>
    /// <param name="token">Cancellation token</param>
    /// <param name="preserveOrder">If true, results will be returned in chronological order of chunks</param>
    /// <returns>Combined results from all chunks</returns>
    /// <remarks>Repositories have to use the <c>IDbContextFactory</c> otherwise exceptions will occur.</remarks>
    public async Task<IReadOnlyCollection<TResult>> ProcessInDailyChunks<TResult>(
        FilterParametersModel filterParameters,
        Func<FilterParametersModel, CancellationToken, Task<IReadOnlyCollection<TResult>>> func,
        CancellationToken token,
        bool preserveOrder = false)
    {
        var dateChunks = GetDateChunks(filterParameters.StartDate, filterParameters.EndDate);

        return preserveOrder
            ? await ProcessChunksInOrder(filterParameters, dateChunks, func, token)
            : await ProcessChunksConcurrently(filterParameters, dateChunks, func, token);
    }

    private async Task<IReadOnlyCollection<TResult>> ProcessChunksConcurrently<TResult>(
        FilterParametersModel filterParameters,
        List<(DateTime startDate, DateTime endDate)> dateChunks,
        Func<FilterParametersModel, CancellationToken, Task<IReadOnlyCollection<TResult>>> func,
        CancellationToken token)
    {
        var tasks = dateChunks.Select(dateChunk => ProcessDateChunk(
            filterParameters,
            dateChunk.startDate,
            dateChunk.endDate,
            func,
            token));

        var allResults = await Task.WhenAll(tasks);
        var combinedResults = allResults.SelectMany(results => results).ToArray();

        return combinedResults;
    }

    private async Task<IReadOnlyCollection<TResult>> ProcessChunksInOrder<TResult>(
        FilterParametersModel filterParameters,
        List<(DateTime startDate, DateTime endDate)> dateChunks,
        Func<FilterParametersModel, CancellationToken, Task<IReadOnlyCollection<TResult>>> func,
        CancellationToken token)
    {
        var tasks = dateChunks.Select(async (dateChunk, index) =>
        {
            var results = await ProcessDateChunk(
                filterParameters,
                dateChunk.startDate,
                dateChunk.endDate,
                func,
                token);
            return new { Index = index, Results = results };
        });

        var completedTasks = await Task.WhenAll(tasks);

        return completedTasks
            .OrderBy(x => x.Index)
            .SelectMany(x => x.Results)
            .ToArray();
    }

    private static List<(DateTime startDate, DateTime endDate)> GetDateChunks(DateTime startDate, DateTime endDate)
    {
        var chunks = new List<(DateTime startDate, DateTime endDate)>();
        var currentDate = startDate.Date;
        var endDateOnly = endDate.Date;

        while (currentDate <= endDateOnly)
        {
            var chunkStart = currentDate == startDate.Date ? startDate : currentDate;
            var chunkEnd = currentDate == endDateOnly ? endDate : currentDate.AddDays(1).AddTicks(-1);

            chunks.Add((chunkStart, chunkEnd));
            currentDate = currentDate.AddDays(1);
        }

        return chunks;
    }

    private async Task<IReadOnlyCollection<TResult>> ProcessDateChunk<TResult>(
        FilterParametersModel originalFilterParameters,
        DateTime chunkStartDate,
        DateTime chunkEndDate,
        Func<FilterParametersModel, CancellationToken, Task<IReadOnlyCollection<TResult>>> func,
        CancellationToken token)
    {
        await _semaphore.WaitAsync(token);
        try
        {
            var chunkFilterParameters = originalFilterParameters with { StartDate = chunkStartDate, EndDate = chunkEndDate };

            return await func(chunkFilterParameters, token);
        }
        finally
        {
            _semaphore.Release();
        }
    }
}

The function also can distinguish whether the results should be returned in chronological order or not. Sometimes useful, sometimes not.

As I was using ASP.NET with Entity Framework Core, that posses a problem: You can't make concurrent requests to the database with the same DbContext. So I had to use a factory to create a new DbContext for each request. So from

public class MyRepository(MyDbContext dbContext)
{
    public async Task<IReadOnlyCollection<SomeResult>> Get3DDate(
        Filters filters,
        CancellationToken cancellationToken)
    {
        return await dbContext.SomeResults
            .Where(x => x.Date >= filters.StartDate && x.Date <= filters.EndDate)
            // ...
            .ToListAsync(cancellationToken);
    }
}

To:

public class MyRepository(IDbContextFactory<MyDbContext> dbContextFactory)
{
    public async Task<IReadOnlyCollection<SomeResult>> Get3DDate(
        Filters filters,
        CancellationToken cancellationToken)
    {
        await using var dbContext = await dbContextFactory.CreateDbContextAsync(cancellationToken);
        return await dbContext.SomeResults
            .Where(x => x.Date >= filters.StartDate && x.Date <= filters.EndDate)
            // ...
            .ToListAsync(cancellationToken);
    }
}
An error has occurred. This application may no longer respond until reloaded. Reload x