The StringBuilder
class is used to create mutable sequences of characters. Strings are immutable, so if you need to perform multiple operations on a string, it is better to use a StringBuilder
instead of a string
. This is especially useful when you need to concatenate a large number of strings. But there is more magic to it, especially when we go BIG!
StringBuilder
Okay, a very short recap of what the StringBuilder
is all about. It is a class that represents a mutable string of characters. string
is immutable and if you perform operations like string.Concat
, +
or +=
, a new string is created every time. This is not the case with StringBuilder
. It is mutable, and you can append, insert, remove, replace, and concatenate strings without creating a new instance of StringBuilder.
The basic idea is to have an internal array (just like List<T>
) that can be resized when needed. Okay, but you probably knew that, and if not, there are many resources out there that explain the basics of StringBuilder
.
BIG strings
But what if you need to create a very large string? Let's say you need to create a string with 100'000 characters. Sure, that isn't a problem, as you might have guessed, and that is not what this blog post is all about. Let's create a StringBuilder
and a List
with 100'000 characters. Yes, if you need "append-only" functionality, you can use a List<string>
and then create a string from it. That is what the .NET team does internally a few times - for performance reasons.
Before we do that let's create a baseline benchmark with only 10000 characters.
[MemoryDiagnoser]
public class StringBenchmark
{
private readonly StringBuilder stringBuilder = new();
private readonly List<string> list = new();
private const int IterationCount = 1000;
private const string TenCharacters = "0123456789";
public static void Main() => BenchmarkRunner.Run<StringBenchmark>();
[Benchmark]
public string StringBuilder()
{
for (var i = 0; i < IterationCount; i++)
{
stringBuilder.Append(TenCharacters);
}
return stringBuilder.ToString();
}
[Benchmark]
public string List()
{
for (var i = 0; i < IterationCount; i++)
{
list.Add(TenCharacters);
}
return string.Concat(list);
}
}
Before I show you any results, please "ignore" the time benchmark here. That is not so important right now. What is important is the memory consumption. Let's run the benchmark.
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|-------------- |---------:|----------:|----------:|-------:|-------:|-------:|----------:|
| StringBuilder | 2.613 us | 0.0142 us | 0.0126 us | 6.3477 | 0.3967 | - | 51.95 KB |
| List | 6.969 us | 0.0320 us | 0.0283 us | 4.3716 | 0.0229 | 0.0076 | 35.81 KB |
The fascinating thing is that the StringBuilder
uses more memory than the List
. Yes, I know, you can also give the initial capacity to the StringBuilder
and List
and it will use less memory. While this is true, you will see in a second why that "doesn't" matter when we go BIG!
Let's go BIG: Let's take "just" 1'000'000 characters. We will use the same benchmark as before, but with 10000 Iterations and the string is 100 characters long.
[MemoryDiagnoser]
public class StringBenchmark
{
private const int IterationCount = 10000;
private const string OneHundredCharacters =
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789";
public static void Main() => BenchmarkRunner.Run<StringBenchmark>();
[Benchmark]
public string StringBuilder()
{
StringBuilder stringBuilder = new();
for (var i = 0; i < IterationCount; i++)
{
stringBuilder.Append(OneHundredCharacters);
}
return stringBuilder.ToString();
}
[Benchmark]
public string List()
{
List<string> list = new();
for (var i = 0; i < IterationCount; i++)
{
list.Add(OneHundredCharacters);
}
return string.Concat(list);
}
}
The results:
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|-------------- |---------:|---------:|---------:|----------:|----------:|---------:|----------:|
| StringBuilder | 553.3 us | 17.36 us | 51.20 us | 696.2891 | 696.2891 | 210.9375 | 3.83 MB |
| List | 652.8 us | 16.35 us | 48.22 us | 1030.2734 | 1018.5547 | 998.0469 | 6.15 MB |
Okay. The StringBuilder
uses way less memory than the List
. And also way less Gen2! What is going on here? To understand why StringBuilder
is optimized for such cases, we have to resort to other means. I will not use the benchmarking but the performance counters of .NET itself. Basically, they provide you a metric of certain things that are going on. For example, in ASP.NET, they can tell you how many requests you are getting per second or how many active requests are currently running.
But they can also tell you how much memory is used, and more importantly, how big the Large Object Heap is. Rider has a nice window for that - so let's use that.
Here the memory consumption of the StringBuilder
:
Look at the LOH (Large Object Heap). We have exactly 0 bytes there! Keep that in mind. Now let's do the same thing, but for the List
:
WOW! We have over 6 Megabytes in the LOH! That is a huge difference.
The Large Object Heap
Before we continue, let's have a small excursion to the Large Object Heap. The LOH is a heap in the .NET runtime that is used to store large objects (hence the name). Large objects are objects that are 85'000 bytes or larger. The LOH is not compacted like the normal heap. To understand what compacting is, have a look at: The garbage collector in .NET - Part 2: Compacting. Like in the old Windows time, where the defragmentation tool was king, the compacting has a similar purpose: Move chunks of memory together to free up space. Not only was this important when the virtual space was very small, but it also could help with caching as data was more local.
The exception to this rule is the LOH. The LOH is not compacted. The .NET team empirically found out that compacting stuff over 85'000 bytes is not worth it. Therefore we have this extra "thing".
What does this have to do with StringBuilder
and List
?
The magic of StringBuilder
A List
is just one big chunk of memory as it backed by one array. So if you have stuff that is over 85'000 bytes, it will end up in the LOH. So for example a list of integers (that is 4 bytes) with more than 21'250 elements will end up in the LOH as it has over 85'000 bytes.
But for example a LinkedList
will not. Why? Well, because it isn't backed by one big array but by many small nodes. So the LOH is not used. And the StringBuilder
is similar.
The StringBuilder
is not one array internally. If you ever looked into the StringBuilder
you find the following:
public sealed class StringBuilder : ISerializable
{
internal char[] m_ChunkChars;
internal StringBuilder m_ChunkPrevious;
The StringBuilder
is basically a linked list of other StringBuilder
instances. Every chunk can get up-to 8'000 characters. If you append more, a new chunk is created and the link between those two instances is created!
Very very clever! Technically that also brings some downsides - not only is the code more complex, but certain operations are not O(1)
. Imagine the indexer: myStringBuilder[10_000]
. This is not O(1)
, as you have to traverse the linked list to get to the right chunk. However, for memory consumption, it is a very clever solution.
Why does the List
use so much memory?
In the first example with "few" characters, we saw that List
used less memory than StringBuilder
, which exploded when we went BIG. The reason is the differences in grow strategies. How does a List
grow:
int newCapacity = _items.Length == 0 ? DefaultCapacity : 2 * _items.Length;
Source: List.cs
At least until the current preview (.NET 9 preview 4), but also likely that it stays like that, a List
has an exponential growth. So imagine the worst case here: Imagine you have 100'000 characters in your List
and your List
is full to the brim. You add one more character. The List
will create a new array with 200'000 characters. The old 100'000 (after the content is copied into the new array) will be thrown away!
The StringBuider
is different. It basically has two mechanism. The first is the m_ChunkChars
array. But that is capped at 8'000 characters. If you append more, a new StringBuilder
with a new m_ChunkChars
array is created. SO that is the second mechanism. But there is no need to throw away stuff by default (don't get me wrong there are scenarios where this happens, but for this discussion it is not relevant).