I saw mutliple places where people use GitHub runners to do some benchmarking. That can be a tricky thing to do.
A small test
Here a very simple demo benchmark I used:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Exporters;
using BenchmarkDotNet.Exporters.Csv;
using BenchmarkDotNet.Running;
var config = DefaultConfig.Instance
.AddExporter(MarkdownExporter.GitHub)
.AddExporter(HtmlExporter.Default)
.AddExporter(CsvExporter.Default);
BenchmarkRunner.Run<PiBenchmarks>(config);
[MemoryDiagnoser]
public class PiBenchmarks
{
private const int Iterations = 100_000;
[Benchmark]
public double Pi_Leibniz()
{
double sum = 0;
for (var i = 0; i < Iterations; i++)
{
var term = ((i & 1) == 0 ? 1.0 : -1.0) / (2 * i + 1);
sum += term;
}
return sum * 4;
}
[Benchmark]
public double Pi_MonteCarlo()
{
var rng = new Random(42);
var inside = 0;
for (var i = 0; i < Iterations; i++)
{
var x = rng.NextDouble();
var y = rng.NextDouble();
if (x * x + y * y <= 1.0)
inside++;
}
return 4.0 * inside / Iterations;
}
}
It doesn't really matter what it does, all it matters it that there are two benchmarks running after each other. Now the yml file looks simply as such:
name: Run Benchmarks
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
workflow_dispatch:
jobs:
benchmark:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
steps:
- uses: actions/checkout@v4
- name: Setup .NET
uses: actions/setup-dotnet@v4
with:
dotnet-version: '10.0.x'
- name: Restore dependencies
run: dotnet restore BenchmarksPi/BenchmarksPi.csproj
- name: Build in Release mode
run: dotnet build BenchmarksPi/BenchmarksPi.csproj --configuration Release --no-restore
- name: Run benchmarks
run: dotnet run --project BenchmarksPi/BenchmarksPi.csproj --configuration Release --no-build
So we run against windows and ubuntu. If you are using a public shared runner, the obvious issue is that two runs after each other might not be comparable. That is, as CPU might spike on the physical machine where you don't have any control over. Maybe you even run on a different VM or physical machine entirely. Basically, you get virtual CPU's and memory. That is a problem. This issue seems to be more impactful on windows runner. On linux it appears to be more stable (but my sample size is 10 - so you can't take it as an argument).
Here are two runs on windows:
AMD EPYC 7763 2.44GHz, 1 CPU, 4 logical and 2 physical cores
.NET SDK 10.0.100
[Host] : .NET 10.0.0 (10.0.0, 10.0.25.52411), X64 RyuJIT x86-64-v3
DefaultJob : .NET 10.0.0 (10.0.0, 10.0.25.52411), X64 RyuJIT x86-64-v3
| Method | Mean | Error | StdDev | Allocated |
|-------------- |-----------:|--------:|--------:|----------:|
| Pi_Leibniz | 140.7 us | 0.12 us | 0.11 us | - |
| Pi_MonteCarlo | 1,719.0 us | 3.23 us | 3.02 us | 304 B |
And run 2:
Intel Xeon Platinum 8370C CPU 2.80GHz (Max: 2.79GHz), 1 CPU, 4 logical and 2 physical cores
.NET SDK 10.0.100
[Host] : .NET 10.0.0 (10.0.0, 10.0.25.52411), X64 RyuJIT x86-64-v4
DefaultJob : .NET 10.0.0 (10.0.0, 10.0.25.52411), X64 RyuJIT x86-64-v4
| Method | Mean | Error | StdDev | Allocated |
|-------------- |-----------:|--------:|--------:|----------:|
| Pi_Leibniz | 116.1 us | 0.13 us | 0.12 us | - |
| Pi_MonteCarlo | 1,773.0 us | 4.09 us | 3.63 us | 304 B |
We can see: Two different CPU's that have different characteristics. Of course the same can happen on a Linux runner as well. Not only that even inside a run you might run into an issue. What if benchmark 1 (in this case Pi_Leibniz) has no CPU load from other runs of other GitHub Repositories but benchmark 2 (Pi_MonteCarlo) has? Well, then the second one has a longer runtime - and it comparison the first one looks better.
When it isn't a problem
Of course, if you have dedicated runners where you control the workload and the environment: That is fine. The problem is only if you have multiple variables, that you can't control. How can you guarantee that your approach is faster without knowing that the environment between both runs (or inside one run but multiple benchmarks) did not change?
