GitHub Runners might not be a good benchmark baseline

12/9/2025

5 minute read

I saw mutliple places where people use GitHub runners to do some benchmarking. That can be a tricky thing to do.

A small test

Here a very simple demo benchmark I used:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Exporters;
using BenchmarkDotNet.Exporters.Csv;
using BenchmarkDotNet.Running;

var config = DefaultConfig.Instance
    .AddExporter(MarkdownExporter.GitHub)
    .AddExporter(HtmlExporter.Default)
    .AddExporter(CsvExporter.Default);

BenchmarkRunner.Run<PiBenchmarks>(config);

[MemoryDiagnoser]
public class PiBenchmarks
{
    private const int Iterations = 100_000;
    
    [Benchmark]
    public double Pi_Leibniz()
    {
        double sum = 0;
        for (var i = 0; i < Iterations; i++)
        {
            var term = ((i & 1) == 0 ? 1.0 : -1.0) / (2 * i + 1);
            sum += term;
        }
        return sum * 4;
    }
    
    [Benchmark]
    public double Pi_MonteCarlo()
    {
        var rng = new Random(42);
        var inside = 0;

        for (var i = 0; i < Iterations; i++)
        {
            var x = rng.NextDouble();
            var y = rng.NextDouble();

            if (x * x + y * y <= 1.0)
                inside++;
        }

        return 4.0 * inside / Iterations;
    }
}

It doesn't really matter what it does, all it matters it that there are two benchmarks running after each other. Now the yml file looks simply as such:

name: Run Benchmarks

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
  workflow_dispatch:

jobs:
  benchmark:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest, windows-latest]
    
    steps:
    - uses: actions/checkout@v4
    
    - name: Setup .NET
      uses: actions/setup-dotnet@v4
      with:
        dotnet-version: '10.0.x'
        
    - name: Restore dependencies
      run: dotnet restore BenchmarksPi/BenchmarksPi.csproj
      
    - name: Build in Release mode
      run: dotnet build BenchmarksPi/BenchmarksPi.csproj --configuration Release --no-restore
      
    - name: Run benchmarks
      run: dotnet run --project BenchmarksPi/BenchmarksPi.csproj --configuration Release --no-build

So we run against windows and ubuntu. If you are using a public shared runner, the obvious issue is that two runs after each other might not be comparable. That is, as CPU might spike on the physical machine where you don't have any control over. Maybe you even run on a different VM or physical machine entirely. Basically, you get virtual CPU's and memory. That is a problem. This issue seems to be more impactful on windows runner. On linux it appears to be more stable (but my sample size is 10 - so you can't take it as an argument).

Here are two runs on windows:

AMD EPYC 7763 2.44GHz, 1 CPU, 4 logical and 2 physical cores
.NET SDK 10.0.100
  [Host]     : .NET 10.0.0 (10.0.0, 10.0.25.52411), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.0 (10.0.0, 10.0.25.52411), X64 RyuJIT x86-64-v3


| Method        | Mean       | Error   | StdDev  | Allocated |
|-------------- |-----------:|--------:|--------:|----------:|
| Pi_Leibniz    |   140.7 us | 0.12 us | 0.11 us |         - |
| Pi_MonteCarlo | 1,719.0 us | 3.23 us | 3.02 us |     304 B |

And run 2:

Intel Xeon Platinum 8370C CPU 2.80GHz (Max: 2.79GHz), 1 CPU, 4 logical and 2 physical cores
.NET SDK 10.0.100
  [Host]     : .NET 10.0.0 (10.0.0, 10.0.25.52411), X64 RyuJIT x86-64-v4
  DefaultJob : .NET 10.0.0 (10.0.0, 10.0.25.52411), X64 RyuJIT x86-64-v4


| Method        | Mean       | Error   | StdDev  | Allocated |
|-------------- |-----------:|--------:|--------:|----------:|
| Pi_Leibniz    |   116.1 us | 0.13 us | 0.12 us |         - |
| Pi_MonteCarlo | 1,773.0 us | 4.09 us | 3.63 us |     304 B |

We can see: Two different CPU's that have different characteristics. Of course the same can happen on a Linux runner as well. Not only that even inside a run you might run into an issue. What if benchmark 1 (in this case Pi_Leibniz) has no CPU load from other runs of other GitHub Repositories but benchmark 2 (Pi_MonteCarlo) has? Well, then the second one has a longer runtime - and it comparison the first one looks better.

When it isn't a problem

Of course, if you have dedicated runners where you control the workload and the environment: That is fine. The problem is only if you have multiple variables, that you can't control. How can you guarantee that your approach is faster without knowing that the environment between both runs (or inside one run but multiple benchmarks) did not change?

Using GitHub as your portfolio

GitHub is a powerful platform for developers to showcase their work and projects. It can be an excellent tool for landing a job, as many employers use it to search for and review potential candidates. In this post, we'll discuss how to effectively use GitHub as a portfolio to impress potential employers and increase your chances of getting hired.

1/6/2023

3 minute read

GitHub

How to benchmark different .NET versions

With the famous BenchmarkDotNet library you can benchmark a lot - but it doesn't stop with a single .NET version. You can benchmark multiple versions of the same code that targets different runtimes!

7/30/2023

2 minute read

C#Benchmark BenchmarkDotNet .NET

From Zero to Production - Generate everything with a single button

This blog post will show you how to setup, from scratch, your GitHub repository so you can in a matter of a single click: Run tests and build your application Release the application for example to nuget Create a Release on GitHub with Release notes Update the documentation utilizing GitHub Pages and DocFx Therefore we will build a "template" repository you can take as a foundation.

4/11/2022

35 minute read

GitHub CI DocFX Dotnet C#Actions CD

Table of Contents

A small test

When it isn't a problem

Related articles you might enjoy

Using GitHub as your portfolio

How to benchmark different .NET versions

From Zero to Production - Generate everything with a single button