The .NET framework is a general-purpose framework. It offers classes and structs for everyday use. One of those examples is the StringBuilder
. The idea behind is simple: Every time we need to concatenate a lot of strings, we can leverage the StringBuilder
, which is faster than concatenating every string with each other.
But we can do better. We can create a StringBuilder
which can perform faster for smaller strings and does use fewer allocations! We will do so with the help of Span
a type introduced with .NET Core 2.1.
First let's answer: Why concatenating strings is slow and takes a lot of allocations. The sole reason is immutability. string
s are immutable. That means if you want to change a string or append another string to our string you have to create a new one. This is a conscious design decision of the C# team. Other languages like Swift or C/C++ have mutable strings! Java has immutable strings as well! Immutable strings or constraints in general have the advantage that you reduce the usage and can better predict what is going on. For strings we have string interning. Basically if two strings have the same value, generally they share the same memory address.
var text1 = "Hello";
var text2 = "Hello";
var sameAddress = text1.ReferenceEquals(text2); // is true
This is where we are coming from. Now quick excurse to the StringBuilder
. The StringBuilder
doesn't use string
s internal. It uses an array and resizes it if we need more space. But exactly that resizing will allocate new memory. And here is where want to tackle the problem!
We want to have a low allocation StringBuilder
which should at least perform the same and all in safe
context. And we can do so. We will do this with the help of Span<T>
.
Span<T> to the rescue
Span<T>
was introduced in .net core 2.1 and the idea is very very simple:
Span
represents a slice of contiguous memory. You can oversimplified think of Span<T>
as the following:
public readonly ref struct Span<T>
{
private readonly ref T _pointer;
private readonly int _length;
...
}
There are a few key things we have to tackle to understand why this is helping us:
Span
is areadonly ref struct
that means it lives on theStack
, always. Here my blog post aboutStack
andHeap
. I would advise you to read that part if you need a refresher on that topic. As it always lives on theStack
we can't use it everywhere where we please. Basically, every time when we would have to put a value type on theHeap
it will not work out! A simple example would be you can't have aref struct
as a field on yourclass
as this would mean thestruct
has to have a reference on the heap. The compiler enforces that constraint, but we gain also performance benefits from that! As always a trade-off.- The
Span<T>
object holds a pointer to our object plus its length. With that, we can model a memory slice where we can iterate over. - Also that means that our
ref T
is on theStack
. We see that is a very specializedstruct
meant for performance!
If you want to know more about Span<T>
here is a link to the official Microsoft documentation.
ValueStringBuilder
Now with this information, we want to build a StringBuilder
that uses way fewer allocations and should at least perform the same or better and certain scenarios. Another requirement: No unsafe code!. I don't want to handle pointers myself, otherwise, I would have written that thing directly in C/C++. I know that there are implementations with unsafe
out in the wild, but we can do better!better is very subjective here, but I hope you know what I mean 😉
You can find the source code in the Resources section down below if you want to directly jump into the code.
We will use the same principles as the StringBuilder
which ships with .NET itself. But we will adopt it:
- We will use an internal array for holding the characters which get appended
- If the current internal buffer is too small we will double the size and copy the old content into the new buffer
ToString()
will give us the whole string- We will offer an indexer which gives us the character at an arbitrary position
public ref struct ValueStringBuilder
{
private int _bufferPosition;
private Span<char> _buffer;
private char[]? arrayFromPool;
public ValueStringBuilder()
{
_bufferPosition = 0;
_buffer = new char[32];
arrayFromPool = null;
}
public ref char this[int index] => ref _buffer[index];
That is our general outline! Huh? we have a the Span<char>
as field eventhough I talked about that is not working. Well it is not working for a class
but if your holding scope is a ref struct
itself then it is fine. Of course this comes with trade-offs in usage I'll show you later. We basically have a small state consisting out of our internal buffer _buffer
and the current position in our buffer _bufferPosition
. This is important because our buffer is statically sized and might be bigger than the string it really represents.
The indexer is also very easy to model: Just give back the char
at the specific position. Done!
The next easy thing is the ToString()
method:
public override string ToString() => new(_buffer[.._bufferPosition]);
We are slicing our _buffer
from the beginning until _bufferPosition
. That is it. Now we only have to fill the internal buffer 😄.
Also here we can start with the easy one, appending a single char:
public void Append(char c)
{
if (_bufferPosition == _buffer.Length - 1)
{
Grow();
}
_buffer[_bufferPosition++] = c;
}
We have a basic check if our buffer still has enough space for another character, if not we have to Grow
our array. How this happens we will show later. If the element (then) fits in we just set the new character at the end of the _bufferPosition
.
Before we come to the point of how Grow
works, let's have a look at appending a whole string:
public void Append(ReadOnlySpan<char> str)
{
var newSize = str.Length + _bufferPosition;
if (newSize > _buffer.Length)
Grow(newSize * 2);
str.CopyTo(_buffer[_bufferPosition..]);
_bufferPosition += str.Length;
}
public void AppendLine(ReadOnlySpan<char> str)
{
Append(str);
Append(Environment.NewLine);
}
Now first: ReadOnlySpan<char>
has an implicit cast from string
so the caller can just pass in "normal" strings. But Span
s are faster to operate on. The difference between Span<T>
and ReadOnlySpan<T>
is, as the name suggests, does not allow mutating the memory slice represented by ReadOnlySpan<T>
. We could also use the in
keyword, but I did not find any performance gains but sometimes it was even slower.
Now the magic, besides Grow()
here is this:
str.CopyTo(_buffer[_bufferPosition..]); // Copy the str into _buffer beginning from _bufferPosition
_bufferPosition += str.Length; // Our internal buffer is now str.Length longer
Now the final bit: Grow()
private void Grow(int capacity = 0)
{
var currentSize = _buffer.Length;
var newSize = capacity > 0 ? capacity : currentSize * 2;
// This should be returned at some point (like a Dispose method)
// This part will not be shown in the code. See resources for a real-world implementation
var rented = ArrayPool<char>.Shared.Rent(newSize);
_buffer.CopyTo(rented);
_buffer = arrayFromPool = rented;
if (oldBuffer is not null)
{
ArrayPool<char>.Shared.Return(oldBuffer);
}
}
If we don't specific the capacity we just double the current internal buffer size. I am currently playing around which growth-size would be good. We have to find a nice balance between not reserving too much space but also not resizing our array too often. For now *2
is good enough.
The next part is a bit tricky. We will use a shared ArrayPool
. I will not go too much into detail. Here is a very nice blog post that explains it better than I could do 😉. The basic idea is that we pool the buffer. So we share that buffer with others if we don't need it anymore. Like car-pooling! The benefit is that we don't have to allocate new objects all the time, which a) costs time and b) allocates stuff that we don't want! So we take that array from the pool with the new size, copy and assign our old content into it and return the rented array. Especially for bigger arrays. I might build a library out of that which looks a bit more in detail about how to fine-tune that. Anyway, we are good for now!
That's it. The usage is pretty trivial:
var stringBuilder = new StringBuilder();
stringBuilder.AppendLine("Hello");
stringBuilder.AppendLine("World");
Console.Write(stringBuilder.ToString();
prints:
Hello
World
Two constraints we have in contrast to the "normal" StringBuilder
:
- We can't use it as a field on a class! That might be a no-go in your use case.
- We don't have the fluent notation as the
StringBuilder
has.
With the .net version you can do that: new StringBuilder().AppendLine("").AppendLine("").ToString();
that does not work with our version. The reason is that we have a struct
. Sure we can make it fluent, but that means we have a lot of allocation going on and potential bugs depending on which version you take. Therefore I decided not to do that.
Performance and allocations
The following setup:
[MemoryDiagnoser]
public class Benchi
{
[Benchmark(Baseline = true)]
public string DotNet()
{
var reference = new StringBuilder();
return reference
.AppendLine("Hello World")
.AppendLine("Here some other text")
.AppendLine("And again some other text as well for good measure")
.AppendLine("You are still here?")
.AppendLine("Hmmm.")
.AppendLine("I wish you a very nice day and all the best.")
.AppendLine("Sincerly Steven")
.ToString();
}
[Benchmark]
public string Fast()
{
var fast = new ValueStringBuilder();
fast.AppendLine("Hello World");
fast.AppendLine("Here some other text");
fast.AppendLine("And again some other text as well for good measure");
fast.AppendLine("You are still here?");
fast.AppendLine("Hmmm.");
fast.AppendLine("I wish you a very nice day and all the best.");
fast.AppendLine("Sincerly Steven");
return fast.ToString();
}
}
Now finally, how well did our new cool implementation?
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1586 (21H1/May2021Update)
Intel Core i7-7820HQ CPU 2.90GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.201
[Host] : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT
DefaultJob : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Allocated |
|------- |---------:|--------:|---------:|------:|--------:|-------:|----------:|
| DotNet | 474.1 ns | 9.49 ns | 25.67 ns | 1.00 | 0.00 | 0.3076 | 1,288 B |
| Fast | 265.3 ns | 8.30 ns | 23.80 ns | 0.56 | 0.06 | 0.1090 | 456 B |
Wow almost 50% faster by only 33% of the allocations! That is massive! Now we can see why the .NET team is using those types more often inside the framework itself to boost the performance.
Conclusion
That was very nice. We used some nice advanced C# features to build our own StringBuilder
. This version is very simplistic and could need a lot of polishing. So: Let me know if I should make a library out of it on nuget. That means further optimization and a bit more convenient functions added, plus, of course, tests! Hit me with a message here, give a like or hit me up on LinkedIn for that.