Create a low allocation and faster StringBuilder - Span in Action

The .NET framework is a general-purpose framework. It offers classes and structs for everyday use. One of those examples is the StringBuilder. The idea behind is simple: Every time we need to concatenate a lot of strings, we can leverage the StringBuilder, which is faster than concatenating every string with each other. But we can do better. We can create a StringBuilder which can perform faster for smaller strings and does use fewer allocations! We will do so with the help of Span a type introduced with .NET Core 2.1.

First let's answer: Why concatenating strings is slow and takes a lot of allocations. The sole reason is immutability. strings are immutable. That means if you want to change a string or append another string to our string you have to create a new one. This is a conscious design decision of the C# team. Other languages like Swift or C/C++ have mutable strings! Java has immutable strings as well! Immutable strings or constraints in general have the advantage that you reduce the usage and can better predict what is going on. For strings we have string interning. Basically if two strings have the same value, generally they share the same memory address.

var text1 = "Hello";
var text2 = "Hello";
var sameAddress = text1.ReferenceEquals(text2); // is true

This is where we are coming from. Now quick excurse to the StringBuilder. The StringBuilder doesn't use strings internal. It uses an array and resizes it if we need more space. But exactly that resizing will allocate new memory. And here is where want to tackle the problem! We want to have a low allocation StringBuilder which should at least perform the same and all in safe context. And we can do so. We will do this with the help of Span<T>.

Span<T> to the rescue

Span<T> was introduced in .net core 2.1 and the idea is very very simple:

Span

Span represents a slice of contiguous memory. You can oversimplified think of Span<T> as the following:

public readonly ref struct Span<T>
{
  private readonly ref T _pointer;
  private readonly int _length;
  ...
}

There are a few key things we have to tackle to understand why this is helping us:

  1. Span is a readonly ref struct that means it lives on the Stack, always. Here my blog post about Stack and Heap. I would advise you to read that part if you need a refresher on that topic. As it always lives on the Stack we can't use it everywhere where we please. Basically, every time when we would have to put a value type on the Heap it will not work out! A simple example would be you can't have a ref struct as a field on your class as this would mean the struct has to have a reference on the heap. The compiler enforces that constraint, but we gain also performance benefits from that! As always a trade-off.
  2. The Span<T> object holds a pointer to our object plus its length. With that, we can model a memory slice where we can iterate over.
  3. Also that means that our ref T is on the Stack. We see that is a very specialized struct meant for performance!

If you want to know more about Span<T> here is a link to the official Microsoft documentation.

ValueStringBuilder

Now with this information, we want to build a StringBuilder that uses way fewer allocations and should at least perform the same or better and certain scenarios. Another requirement: No unsafe code!. I don't want to handle pointers myself, otherwise, I would have written that thing directly in C/C++. I know that there are implementations with unsafe out in the wild, but we can do better!better is very subjective here, but I hope you know what I mean 😉

You can find the source code in the Resources section down below if you want to directly jump into the code.

We will use the same principles as the StringBuilder which ships with .NET itself. But we will adopt it:

  1. We will use an internal array for holding the characters which get appended
  2. If the current internal buffer is too small we will double the size and copy the old content into the new buffer
  3. ToString() will give us the whole string
  4. We will offer an indexer which gives us the character at an arbitrary position
public ref struct ValueStringBuilder
{
    private int _bufferPosition;
    private Span<char> _buffer;

    public ValueStringBuilder()
    {
        _bufferPosition = 0;
        _buffer = new char[32];
    }

    public ref char this[int index] => ref _buffer[index];

That is our general outline! Huh? we have a the Span<char> as field eventhough I talked about that is not working. Well it is not working for a class but if your holding scope is a ref struct itself then it is fine. Of course this comes with trade-offs in usage I'll show you later. We basically have a small state consisting out of our internal buffer _buffer and the current position in our buffer _bufferPosition. This is important because our buffer is statically sized and might be bigger than the string it really represents. The indexer is also very easy to model: Just give back the char at the specific position. Done!

The next easy thing is the ToString() method:

public override string ToString() => new(_buffer[.._bufferPosition]);

We are slicing our _buffer from the beginning until _bufferPosition. That is it. Now we only have to fill the internal buffer 😄. Also here we can start with the easy one, appending a single char:

public void Append(char c)
{
    if (_bufferPosition == _buffer.Length - 1)
    {
        Grow();
    }

    _buffer[_bufferPosition++] = c;
}

We have a basic check if our buffer still has enough space for another character, if not we have to Grow our array. How this happens we will show later. If the element (then) fits in we just set the new character at the end of the _bufferPosition.

Before we come to the point of how Grow works, let's have a look at appending a whole string:

public void Append(ReadOnlySpan<char> str)
{
    var newSize = str.Length + _bufferPosition;
    if (newSize > _buffer.Length)
        Grow(newSize * 2);

    str.CopyTo(_buffer[_bufferPosition..]);
    _bufferPosition += str.Length;
}

public void AppendLine(ReadOnlySpan<char> str)
{
    Append(str);
    Append(Environment.NewLine);
}

Now first: ReadOnlySpan<char> has an implicit cast from string so the caller can just pass in "normal" strings. But Spans are faster to operate on. The difference between Span<T> and ReadOnlySpan<T> is, as the name suggests, does not allow mutating the memory slice represented by ReadOnlySpan<T>. We could also use the in keyword, but I did not find any performance gains but sometimes it was even slower. Now the magic, besides Grow() here is this:

str.CopyTo(_buffer[_bufferPosition..]); // Copy the str into _buffer beginning from _bufferPosition
_bufferPosition += str.Length; // Our internal buffer is now str.Length longer

Now the final bit: Grow()

private void Grow(int capacity = 0)
{
    var currentSize = _buffer.Length;
    var newSize = capacity > 0 ? capacity : currentSize * 2;
    // This should be returned at some point (like a Dispose method)
    // This part will not be shown in the code. See resources for a real-world implementation
    var rented = ArrayPool<char>.Shared.Rent(newSize);
    _buffer.CopyTo(rented);
    _buffer = rented;
}

If we don't specific the capacity we just double the current internal buffer size. I am currently playing around which growth-size would be good. We have to find a nice balance between not reserving too much space but also not resizing our array too often. For now *2 is good enough.

The next part is a bit tricky. We will use a shared ArrayPool. I will not go too much into detail. Here is a very nice blog post that explains it better than I could do 😉. The basic idea is that we pool the buffer. So we share that buffer with others if we don't need it anymore. Like car-pooling! The benefit is that we don't have to allocate new objects all the time, which a) costs time and b) allocates stuff that we don't want! So we take that array from the pool with the new size, copy and assign our old content into it and return the rented array. Especially for bigger arrays. I might build a library out of that which looks a bit more in detail about how to fine-tune that. Anyway, we are good for now!

That's it. The usage is pretty trivial:

var stringBuilder = new StringBuilder();
stringBuilder.AppendLine("Hello");
stringBuilder.AppendLine("World");

Console.Write(stringBuilder.ToString();

prints:

Hello
World

Two constraints we have in contrast to the "normal" StringBuilder:

  1. We can't use it as a field on a class! That might be a no-go in your use case.
  2. We don't have the fluent notation as the StringBuilder has.

With the .net version you can do that: new StringBuilder().AppendLine("").AppendLine("").ToString(); that does not work with our version. The reason is that we have a struct. Sure we can make it fluent, but that means we have a lot of allocation going on and potential bugs depending on which version you take. Therefore I decided not to do that.

Performance and allocations

The following setup:

[MemoryDiagnoser]
public class Benchi
{
    [Benchmark(Baseline = true)]
    public string DotNet()
    {
        var reference = new StringBuilder();
        return reference
            .AppendLine("Hello World")
            .AppendLine("Here some other text")
            .AppendLine("And again some other text as well for good measure")
            .AppendLine("You are still here?")
            .AppendLine("Hmmm.")
            .AppendLine("I wish you a very nice day and all the best.")
            .AppendLine("Sincerly Steven")
            .ToString();
    }

    [Benchmark]
    public string Fast()
    {
        var fast = new ValueStringBuilder();
        fast.AppendLine("Hello World");
        fast.AppendLine("Here some other text");
        fast.AppendLine("And again some other text as well for good measure");
        fast.AppendLine("You are still here?");
        fast.AppendLine("Hmmm.");
        fast.AppendLine("I wish you a very nice day and all the best.");
        fast.AppendLine("Sincerly Steven");
        return fast.ToString();
    }
}

Now finally, how well did our new cool implementation?

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1586 (21H1/May2021Update)
Intel Core i7-7820HQ CPU 2.90GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.201
  [Host]     : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT
  DefaultJob : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT


| Method |     Mean |   Error |   StdDev | Ratio | RatioSD |  Gen 0 | Allocated |
|------- |---------:|--------:|---------:|------:|--------:|-------:|----------:|
| DotNet | 474.1 ns | 9.49 ns | 25.67 ns |  1.00 |    0.00 | 0.3076 |   1,288 B |
|   Fast | 265.3 ns | 8.30 ns | 23.80 ns |  0.56 |    0.06 | 0.1090 |     456 B |

Wow almost 50% faster by only 33% of the allocations! That is massive! Now we can see why the .NET team is using those types more often inside the framework itself to boost the performance.

Conclusion

That was very nice. We used some nice advanced C# features to build our own StringBuilder. This version is very simplistic and could need a lot of polishing. So: Let me know if I should make a library out of it on nuget. That means further optimization and a bit more convenient functions added, plus, of course, tests! Hit me with a message here, give a like or hit me up on LinkedIn for that.

Resources

  • The code for this blog post on GitHub
  • All my examples for this blog found here
  • Full fledge ValueStringBuilder can be found here.
10
Buy Me a Coffee at ko-fi.com
An error has occurred. This application may no longer respond until reloaded. Reload x