How to enumerate through a StringBuilder

12/3/2022
10 minute read

Did you ever wonder how we can iterate through a StringBuilder? I mean, of course, we can just call ToString and use the returned string, but that means we materialize the whole thing without good reason.

We can also use a normal for-loop. But we can also find a completely different and probably dumber way! And if you wonder: No, this is not something you do in your daily life, but by doing so, I can show some cool stuff C# and .NET offer.

Act 1: The "problem"

What if we want to have the following code:

var builder = new StringBuilder();
// ... here we fill that thing
foreach (var character in builder)
{
  // We do more stuff here
}

Well, then you are out of look with the following error: "Type 'System.Text.StringBuilder' cannot be used in 'foreach' statement because it neither implements 'IEnumerable' or 'IEnumerable', nor has suitable 'GetEnumerator' method which return type has 'Current' property and 'MoveNext' method". And yes I understand, that is nothing you would do or use on a daily base ... or weekly base... or ever! 😉

Here is where we can misuse some magic of the compiler.

Act 2: foreach

Now over 1.5 years ago I already wrote about that topic: "Shorts: foreach on type without IEnumerable". Anyway, I will do it here in short again. First let us see what the compiler does with a foreach loop - then yes foreach is nothing the compiler "knows" when he tries to convert your C# code to MSIL code. The compiler lowers your statement.

foreach(var item in new List<int>())
    _ = item;

Becomes:

List<int>.Enumerator enumerator = new List<int>().GetEnumerator();
try
{
    while (enumerator.MoveNext())
    {
        int current = enumerator.Current;
    }
}
finally
{
    ((IDisposable)enumerator).Dispose();
}

We're just getting an enumerator from our List and calling MoveNext until it returns false. If it doesn't return false we can use enumerator.Current to get our object. Of course, that happens "behind the scenes", but more remarkably you don't need to implement an interface to make it work and this is explicitly stated in the C# language specification:

  1. If the type X of expression is an array type then there is an implicit reference conversion from X to the System.Collections.IEnumerable interface (since System.Array implements this interface). The collection type is the System.Collections.IEnumerable interface, the enumerator type is the System.Collections.IEnumerator interface and the element type is the element type of the array type X.
  2. Otherwise, determine whether the type X has an appropriate GetEnumerator method

Now it gets interesting. StringBuilder does not inherit from IEnumerable nor does it have an IEnumerator interface (yes this interface is purely for convenience - again: the compiler duck types this stuff for you). So let's check point 2: "the type X has an appropriate GetEnumerator method". Also no. Well until now!

Act 3: Extensions

The specification does not state where this GetEnumerator function has to come from and this is where we can hook ourselves in! The most naive approach would be the following:

var sb = new StringBuilder();

foreach (var character in sb)
{
    _ = character;
}

public static class Extensions
{
    public static IEnumerator<char> GetEnumerator(this StringBuilder s) => s.ToString().GetEnumerator();
}

And now we can compile and run the code! Yeah, party time. Well not so fast. That is a horrible solution as we will "materialize" the string and therefore allocate a lot of memory. So let's do it better. After all, we could have done this without the extension method.

Act 4: Custom Enumerator

So let's create a custom enumerator. An enumerator as already said above only needs a MoveNext and Current object. The whole try-finally dispose logic you saw earlier is basically to reset the internal state machine of the enumerator.

public static class Extensions
{
    public static StringEnumerator GetEnumerator(this StringBuilder s) => new(s);
}

Now the enumerator itself:

/// <summary>
/// ref structs can't escape to the heap
/// We do this to further optimize the runtime and allocations
/// I know that mutable value types are evil, but normally this type
/// is not handled by user code, so it is fine
/// </summary>
public ref struct StringEnumerator
{
    private readonly StringBuilder _s;
    private int index = -1;
    public StringEnumerator(StringBuilder s) => _s = s;

    public bool MoveNext()
    {
        index++;
        return index < _s.Length;
    }

    /// <summary>
    /// This is potentially dangerous as this is not always O(1)
    /// The StringBuilder has chunks, which are at most 8000 bytes long
    /// So having a string longer than 8000 bytes might have to access
    /// the second chunk.
    /// </summary>
    public char Current => _s[index];

    public void Dispose() => index = -1;
}

Act 5: Let's compete!

Let us check the performance of our beautifully crafted solution:

[MemoryDiagnoser]
public class Benchmark
{
    private StringBuilder _sb = default!;

    [Params(10, 100, 1000)]
    public int Chars { get; set; }

    [GlobalSetup]
    public void Setup() => _sb = new StringBuilder(new string('c', Chars));

    [Benchmark(Baseline = true)]
    public void EnumeratorViaToString()
    {
        foreach (var c in _sb.ToString()) _ = c;
    }

    [Benchmark]
    public void EnumeratorViaExtension()
    {
        foreach (var c in _sb) _ = c;
    }

    [Benchmark]
    public void IterateViaForLoop()
    {
        for (var i = 0; i < _sb.Length; i++)
            _ = _sb[i];
    }

    [Benchmark]
    public void ViaChunkEnumerator()
    {
        foreach (var chunk in _sb.GetChunks())
        {
            foreach (var c in chunk.Span)
                _ = c;
        }
    }
}

Results:

BenchmarkDotNet=v0.13.2, OS=macOS Monterey 12.6.1 (21G217) [Darwin 21.6.0]
Apple M1 Pro, 1 CPU, 10 logical and 10 physical cores
.NET SDK=7.0.100
  [Host]     : .NET 7.0.0 (7.0.22.51805), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 7.0.0 (7.0.22.51805), Arm64 RyuJIT AdvSIMD


|                 Method | Chars |         Mean |      Error |     StdDev | Ratio | RatioSD |   Gen0 | Allocated | Alloc Ratio |
|----------------------- |------ |-------------:|-----------:|-----------:|------:|--------:|-------:|----------:|------------:|
|  EnumeratorViaToString |    10 |    10.040 ns |  0.1115 ns |  0.1043 ns |  1.00 |    0.00 | 0.0076 |      48 B |        1.00 |
| EnumeratorViaExtension |    10 |    17.128 ns |  0.2386 ns |  0.2115 ns |  1.71 |    0.03 |      - |         - |        0.00 |
|      IterateViaForLoop |    10 |    19.068 ns |  0.1264 ns |  0.1120 ns |  1.90 |    0.02 |      - |         - |        0.00 |
|     ViaChunkEnumerator |    10 |     6.186 ns |  0.0196 ns |  0.0184 ns |  0.62 |    0.01 |      - |         - |        0.00 |
|                        |       |              |            |            |       |         |        |           |             |
|  EnumeratorViaToString |   100 |    50.039 ns |  0.5743 ns |  0.5372 ns |  1.00 |    0.00 | 0.0357 |     224 B |        1.00 |
| EnumeratorViaExtension |   100 |   191.118 ns |  1.7916 ns |  1.5882 ns |  3.82 |    0.06 |      - |         - |        0.00 |
|      IterateViaForLoop |   100 |   194.726 ns |  0.5682 ns |  0.5037 ns |  3.89 |    0.04 |      - |         - |        0.00 |
|     ViaChunkEnumerator |   100 |    40.039 ns |  0.5103 ns |  0.4773 ns |  0.80 |    0.01 |      - |         - |        0.00 |
|                        |       |              |            |            |       |         |        |           |             |
|  EnumeratorViaToString |  1000 |   418.318 ns |  5.0539 ns |  4.7274 ns |  1.00 |    0.00 | 0.3223 |    2024 B |        1.00 |
| EnumeratorViaExtension |  1000 | 1,828.645 ns | 20.9355 ns | 19.5831 ns |  4.37 |    0.07 |      - |         - |        0.00 |
|      IterateViaForLoop |  1000 | 1,820.386 ns |  4.1339 ns |  3.6646 ns |  4.35 |    0.05 |      - |         - |        0.00 |
|     ViaChunkEnumerator |  1000 |   328.201 ns |  3.3440 ns |  3.1280 ns |  0.78 |    0.01 |      - |         - |        0.00 |

Benchmark

Pretty obvious the solution isn't great, but that is fine as it never was intended to be the high performer. The goal was to enumerate through an object, which didn't offer such a thing in the first place.

Conclusion

Extension methods are a beast in .NET. You can achieve a lot of things with them. Even enumerating through a StringBuilder.

Resources

  • Source code to this blog post: here
  • All my sample code is hosted in this repository: here

"Use always a StringBuilder" - Internet myths

Use always a StringBuilder

That is what you can read from time to time. The basic idea is, that a StringBuilder is "better" to be defined.

Why are people telling that lie? Let's discuss this and see what it isn't true.

Create a low allocation and faster StringBuilder - Span in Action

.NET ships a nice StringBuilder since the dawn of time or at least since the beginning of the framework itself. The intention is simple: If we need to concatenate a lot of strings we can't rely on the + operator as we introduce a lot of unnecessary allocations plus it is slow! That is where the StringBuilder jumps into the picture.

Why is that? Why is the StringBuilder better? And can we do better? Spoiler: Yes!

StringBuilders magic for very large strings

The StringBuilder class is used to create mutable sequences of characters. Strings are immutable, so if you need to perform multiple operations on a string, it is better to use a StringBuilder instead of a string. This is especially useful when you need to concatenate a large number of strings. But there is more magic to it, especially when we go BIG!

An error has occurred. This application may no longer respond until reloaded. Reload x