In this - somewhat technical and barely usable - blog post, we will have a look at inlining and structs in C#. And how they can optimize performance in some interesting ways.
Inlining
Inlining is a compiler optimization that replaces a method call with the method's body. So if you have the following code:
public int Add(int a, int b) => a + b;
public int CalculateSum(int x, int y)
{
return Add(x, y);
}
The compiler might optimize it to:
public int CalculateSum(int x, int y)
{
return x + y;
}
The obvious advantage here is that we avoid the overhead of a method call, but it can also increase the size of the code, which can have its own performance implications (as we copy the body of a method to every place it is called).
There is an attribute called: [MethodImpl(MethodImplOptions.AggressiveInlining)]
that can be used to suggest to the compiler that a method should be inlined, even if it might not do so by default. That is just a hint, and the JIT can still choose to ignore it.
We can also use [MethodImpl(MethodImplOptions.NoInlining)]
to prevent a method from being inlined (or better: to suggest the JIT that it should not inline it).
struct
s
An integral part of struct
s is that they are, normally, passed by value. This means that when you pass a struct
to a method, a copy of the struct
is made. So:
public struct Point
{
public int X;
public int Y;
}
public void MovePoint(Point p)
{
p.X += 10;
p.Y += 10;
}
Point myPoint = new Point { X = 0, Y = 0 };
MovePoint(myPoint);
// myPoint is still { X = 0, Y = 0 }
Ideally, you try to keep struct
s immutable for exactly that reason. I even had a blog post about that: "Mutable value types are evil! Sort of...". Also you try to keep them small to avoid the overhead of copying large amounts of data. And that is where those two collide:
Inlining struct
methods
And here is the "beauty": If we inline a function, we "erase" the need for a copy of the struct
onto the new stackframe. So we substitute the method call with the body of the method, and thus we do not need to copy the struct
to a new stackframe. So inlining can actually make passing struct
s cheaper.
Let's have a look at the following benchmark:
public class InlineVsNonInlineBenchmark
{
private SomeBigStruct _someBigStruct;
[Benchmark]
public int NonInline() => GetFNonInline(_someBigStruct) + GetFNonInline(_someBigStruct);
[Benchmark]
public int Inline() => GetFInline(_someBigStruct) + GetFInline(_someBigStruct);
[MethodImpl(MethodImplOptions.NoInlining)]
private int GetFNonInline(SomeBigStruct s) => s.F;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private int GetFInline(SomeBigStruct s) => s.F;
}
public struct SomeBigStruct
{
public int A { get; set; }
public int B { get; set; }
public int C { get; set; }
public int D { get; set; }
public int E { get; set; }
public int F { get; set; }
public int G { get; set; }
public int H { get; set; }
public int I { get; set; }
public int J { get; set; }
public int K { get; set; }
public int L { get; set; }
public int M { get; set; }
public int N { get; set; }
public int O { get; set; }
public int P { get; set; }
public int Q { get; set; }
public int R { get; set; }
public int S { get; set; }
public int T { get; set; }
public int U { get; set; }
public int V { get; set; }
public int W { get; set; }
public int X { get; set; }
public int Y { get; set; }
public int Z { get; set; }
}
Results:
| Method | Mean | Error | StdDev |
|---------- |----------:|----------:|----------:|
| NonInline | 3.3646 ns | 0.0130 ns | 0.0109 ns |
| Inline | 0.0000 ns | 0.0000 ns | 0.0000 ns |
Of course we have only one operation - that benchmark is not ideal and should be taken with a grain of salt (especially because the amount of operations is way too low).
Impressive anyway! Now, how do we know it is avoiding the copy rather than just being faster because of inlining? Well - we can increase or decrease the amount of properties in SomeBigStruct
and see how that affects the results. If we remove some of the properties (so we only have A
to N
), we get:
| Method | Mean | Error | StdDev |
|---------- |----------:|----------:|----------:|
| NonInline | 2.2966 ns | 0.0575 ns | 0.0449 ns |
| Inline | 0.0000 ns | 0.0000 ns | 0.0000 ns |
Even without a benchmark, we can check sharplab.io:
InlineVsNonInlineBenchmark.NonInline()
L0000: push ebp
L0001: mov ebp, esp
L0003: push edi
L0004: push esi
L0005: push ebx
L0006: mov esi, ecx
L0008: lea edi, [esi+4]
L000b: sub esp, 0x38
L000e: vmovdqu xmm0, [edi]
L0012: vmovdqu [esp], xmm0
L0017: vmovdqu xmm0, [edi+0x10]
L001c: vmovdqu [esp+0x10], xmm0
L0022: vmovdqu xmm0, [edi+0x20]
L0027: vmovdqu [esp+0x20], xmm0
L002d: vmovq xmm0, [edi+0x30]
L0032: vmovq [esp+0x30], xmm0
L0038: mov ecx, esi
L003a: call 0x2b1b0018
L003f: mov ebx, eax
L0041: sub esp, 0x38
L0044: vmovdqu xmm0, [edi]
L0048: vmovdqu [esp], xmm0
L004d: vmovdqu xmm0, [edi+0x10]
L0052: vmovdqu [esp+0x10], xmm0
L0058: vmovdqu xmm0, [edi+0x20]
L005d: vmovdqu [esp+0x20], xmm0
L0063: vmovq xmm0, [edi+0x30]
L0068: vmovq [esp+0x30], xmm0
L006e: mov ecx, esi
L0070: call 0x2b1b0018
L0075: add eax, ebx
L0077: pop ebx
L0078: pop esi
L0079: pop edi
L007a: pop ebp
L007b: ret
InlineVsNonInlineBenchmark.Inline()
L0000: mov eax, [ecx+0x18]
L0003: add eax, eax
L0005: ret
Now you don't have to understand the JIT ASM code. But just the amount of code shows you that Inline
is probably faster. Basically NonInline
has to copy a lot (vmovdqu
and vmovq
instructions) while Inline
just reads the property, adds something and returns.