In this - somewhat technical and barely usable - blog post, we will have a look at inlining and structs in C#. And how they can optimize performance in some interesting ways.
Inlining
Inlining is a compiler optimization that replaces a method call with the method's body. So if you have the following code:
public int Add(int a, int b) => a + b;
public int CalculateSum(int x, int y)
{
return Add(x, y);
}
The compiler might optimize it to:
public int CalculateSum(int x, int y)
{
return x + y;
}
The obvious advantage here is that we avoid the overhead of a method call, but it can also increase the size of the code, which can have its own performance implications (as we copy the body of a method to every place it is called).
There is an attribute called: [MethodImpl(MethodImplOptions.AggressiveInlining)] that can be used to suggest to the compiler that a method should be inlined, even if it might not do so by default. That is just a hint, and the JIT can still choose to ignore it.
We can also use [MethodImpl(MethodImplOptions.NoInlining)] to prevent a method from being inlined (or better: to suggest the JIT that it should not inline it).
structs
An integral part of structs is that they are, normally, passed by value. This means that when you pass a struct to a method, a copy of the struct is made. So:
public struct Point
{
public int X;
public int Y;
}
public void MovePoint(Point p)
{
p.X += 10;
p.Y += 10;
}
Point myPoint = new Point { X = 0, Y = 0 };
MovePoint(myPoint);
// myPoint is still { X = 0, Y = 0 }
Ideally, you try to keep structs immutable for exactly that reason. I even had a blog post about that: "Mutable value types are evil! Sort of...". Also you try to keep them small to avoid the overhead of copying large amounts of data. And that is where those two collide:
Inlining struct methods
And here is the "beauty": If we inline a function, we "erase" the need for a copy of the struct onto the new stackframe. So we substitute the method call with the body of the method, and thus we do not need to copy the struct to a new stackframe. So inlining can actually make passing structs cheaper.
Let's have a look at the following benchmark:
public class InlineVsNonInlineBenchmark
{
private SomeBigStruct _someBigStruct;
[Benchmark]
public int NonInline() => GetFNonInline(_someBigStruct) + GetFNonInline(_someBigStruct);
[Benchmark]
public int Inline() => GetFInline(_someBigStruct) + GetFInline(_someBigStruct);
[MethodImpl(MethodImplOptions.NoInlining)]
private int GetFNonInline(SomeBigStruct s) => s.F;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private int GetFInline(SomeBigStruct s) => s.F;
}
public struct SomeBigStruct
{
public int A { get; set; }
public int B { get; set; }
public int C { get; set; }
public int D { get; set; }
public int E { get; set; }
public int F { get; set; }
public int G { get; set; }
public int H { get; set; }
public int I { get; set; }
public int J { get; set; }
public int K { get; set; }
public int L { get; set; }
public int M { get; set; }
public int N { get; set; }
public int O { get; set; }
public int P { get; set; }
public int Q { get; set; }
public int R { get; set; }
public int S { get; set; }
public int T { get; set; }
public int U { get; set; }
public int V { get; set; }
public int W { get; set; }
public int X { get; set; }
public int Y { get; set; }
public int Z { get; set; }
}
Results:
| Method | Mean | Error | StdDev |
|---------- |----------:|----------:|----------:|
| NonInline | 3.3646 ns | 0.0130 ns | 0.0109 ns |
| Inline | 0.0000 ns | 0.0000 ns | 0.0000 ns |
Of course we have only one operation - that benchmark is not ideal and should be taken with a grain of salt (especially because the amount of operations is way too low).
Impressive anyway! Now, how do we know it is avoiding the copy rather than just being faster because of inlining? Well - we can increase or decrease the amount of properties in SomeBigStruct and see how that affects the results. If we remove some of the properties (so we only have A to N), we get:
| Method | Mean | Error | StdDev |
|---------- |----------:|----------:|----------:|
| NonInline | 2.2966 ns | 0.0575 ns | 0.0449 ns |
| Inline | 0.0000 ns | 0.0000 ns | 0.0000 ns |
Even without a benchmark, we can check sharplab.io:
InlineVsNonInlineBenchmark.NonInline()
L0000: push ebp
L0001: mov ebp, esp
L0003: push edi
L0004: push esi
L0005: push ebx
L0006: mov esi, ecx
L0008: lea edi, [esi+4]
L000b: sub esp, 0x38
L000e: vmovdqu xmm0, [edi]
L0012: vmovdqu [esp], xmm0
L0017: vmovdqu xmm0, [edi+0x10]
L001c: vmovdqu [esp+0x10], xmm0
L0022: vmovdqu xmm0, [edi+0x20]
L0027: vmovdqu [esp+0x20], xmm0
L002d: vmovq xmm0, [edi+0x30]
L0032: vmovq [esp+0x30], xmm0
L0038: mov ecx, esi
L003a: call 0x2b1b0018
L003f: mov ebx, eax
L0041: sub esp, 0x38
L0044: vmovdqu xmm0, [edi]
L0048: vmovdqu [esp], xmm0
L004d: vmovdqu xmm0, [edi+0x10]
L0052: vmovdqu [esp+0x10], xmm0
L0058: vmovdqu xmm0, [edi+0x20]
L005d: vmovdqu [esp+0x20], xmm0
L0063: vmovq xmm0, [edi+0x30]
L0068: vmovq [esp+0x30], xmm0
L006e: mov ecx, esi
L0070: call 0x2b1b0018
L0075: add eax, ebx
L0077: pop ebx
L0078: pop esi
L0079: pop edi
L007a: pop ebp
L007b: ret
InlineVsNonInlineBenchmark.Inline()
L0000: mov eax, [ecx+0x18]
L0003: add eax, eax
L0005: ret
Now you don't have to understand the JIT ASM code. But just the amount of code shows you that Inline is probably faster. Basically NonInline has to copy a lot (vmovdqu and vmovq instructions) while Inline just reads the property, adds something and returns.


