Did you ever wonder how we can iterate through a StringBuilder
? I mean, of course, we can just call ToString
and use the returned string, but that means we materialize the whole thing without good reason.
We can also use a normal for-loop. But we can also find a completely different and probably dumber way! And if you wonder: No, this is not something you do in your daily life, but by doing so, I can show some cool stuff C# and .NET offer.
Act 1: The "problem"
What if we want to have the following code:
var builder = new StringBuilder();
// ... here we fill that thing
foreach (var character in builder)
{
// We do more stuff here
}
Well, then you are out of look with the following error: "Type 'System.Text.StringBuilder' cannot be used in 'foreach' statement because it neither implements 'IEnumerable' or 'IEnumerable
Here is where we can misuse some magic of the compiler.
Act 2: foreach
Now over 1.5 years ago I already wrote about that topic: "Shorts: foreach on type without IEnumerable". Anyway, I will do it here in short again. First let us see what the compiler does with a foreach loop - then yes foreach is nothing the compiler "knows" when he tries to convert your C# code to MSIL code. The compiler lowers your statement.
foreach(var item in new List<int>())
_ = item;
Becomes:
List<int>.Enumerator enumerator = new List<int>().GetEnumerator();
try
{
while (enumerator.MoveNext())
{
int current = enumerator.Current;
}
}
finally
{
((IDisposable)enumerator).Dispose();
}
We're just getting an enumerator from our List and calling MoveNext
until it returns false. If it doesn't return false we can use enumerator.Current
to get our object. Of course, that happens "behind the scenes", but more remarkably you don't need to implement an interface to make it work and this is explicitly stated in the C# language specification:
- If the type X of expression is an array type then there is an implicit reference conversion from X to the System.Collections.IEnumerable interface (since System.Array implements this interface). The collection type is the System.Collections.IEnumerable interface, the enumerator type is the System.Collections.IEnumerator interface and the element type is the element type of the array type X.
- Otherwise, determine whether the type X has an appropriate GetEnumerator method
Now it gets interesting. StringBuilder
does not inherit from IEnumerable
nor does it have an IEnumerator
interface (yes this interface is purely for convenience - again: the compiler duck types this stuff for you). So let's check point 2: "the type X has an appropriate GetEnumerator method". Also no. Well until now!
Act 3: Extensions
The specification does not state where this GetEnumerator
function has to come from and this is where we can hook ourselves in! The most naive approach would be the following:
var sb = new StringBuilder();
foreach (var character in sb)
{
_ = character;
}
public static class Extensions
{
public static IEnumerator<char> GetEnumerator(this StringBuilder s) => s.ToString().GetEnumerator();
}
And now we can compile and run the code! Yeah, party time. Well not so fast. That is a horrible solution as we will "materialize" the string and therefore allocate a lot of memory. So let's do it better. After all, we could have done this without the extension method.
Act 4: Custom Enumerator
So let's create a custom enumerator. An enumerator as already said above only needs a MoveNext
and Current
object. The whole try-finally dispose logic you saw earlier is basically to reset the internal state machine of the enumerator.
public static class Extensions
{
public static StringEnumerator GetEnumerator(this StringBuilder s) => new(s);
}
Now the enumerator itself:
/// <summary>
/// ref structs can't escape to the heap
/// We do this to further optimize the runtime and allocations
/// I know that mutable value types are evil, but normally this type
/// is not handled by user code, so it is fine
/// </summary>
public ref struct StringEnumerator
{
private readonly StringBuilder _s;
private int index = -1;
public StringEnumerator(StringBuilder s) => _s = s;
public bool MoveNext()
{
index++;
return index < _s.Length;
}
/// <summary>
/// This is potentially dangerous as this is not always O(1)
/// The StringBuilder has chunks, which are at most 8000 bytes long
/// So having a string longer than 8000 bytes might have to access
/// the second chunk.
/// </summary>
public char Current => _s[index];
public void Dispose() => index = -1;
}
Act 5: Let's compete!
Let us check the performance of our beautifully crafted solution:
[MemoryDiagnoser]
public class Benchmark
{
private StringBuilder _sb = default!;
[Params(10, 100, 1000)]
public int Chars { get; set; }
[GlobalSetup]
public void Setup() => _sb = new StringBuilder(new string('c', Chars));
[Benchmark(Baseline = true)]
public void EnumeratorViaToString()
{
foreach (var c in _sb.ToString()) _ = c;
}
[Benchmark]
public void EnumeratorViaExtension()
{
foreach (var c in _sb) _ = c;
}
[Benchmark]
public void IterateViaForLoop()
{
for (var i = 0; i < _sb.Length; i++)
_ = _sb[i];
}
[Benchmark]
public void ViaChunkEnumerator()
{
foreach (var chunk in _sb.GetChunks())
{
foreach (var c in chunk.Span)
_ = c;
}
}
}
Results:
BenchmarkDotNet=v0.13.2, OS=macOS Monterey 12.6.1 (21G217) [Darwin 21.6.0]
Apple M1 Pro, 1 CPU, 10 logical and 10 physical cores
.NET SDK=7.0.100
[Host] : .NET 7.0.0 (7.0.22.51805), Arm64 RyuJIT AdvSIMD
DefaultJob : .NET 7.0.0 (7.0.22.51805), Arm64 RyuJIT AdvSIMD
| Method | Chars | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Allocated | Alloc Ratio |
|----------------------- |------ |-------------:|-----------:|-----------:|------:|--------:|-------:|----------:|------------:|
| EnumeratorViaToString | 10 | 10.040 ns | 0.1115 ns | 0.1043 ns | 1.00 | 0.00 | 0.0076 | 48 B | 1.00 |
| EnumeratorViaExtension | 10 | 17.128 ns | 0.2386 ns | 0.2115 ns | 1.71 | 0.03 | - | - | 0.00 |
| IterateViaForLoop | 10 | 19.068 ns | 0.1264 ns | 0.1120 ns | 1.90 | 0.02 | - | - | 0.00 |
| ViaChunkEnumerator | 10 | 6.186 ns | 0.0196 ns | 0.0184 ns | 0.62 | 0.01 | - | - | 0.00 |
| | | | | | | | | | |
| EnumeratorViaToString | 100 | 50.039 ns | 0.5743 ns | 0.5372 ns | 1.00 | 0.00 | 0.0357 | 224 B | 1.00 |
| EnumeratorViaExtension | 100 | 191.118 ns | 1.7916 ns | 1.5882 ns | 3.82 | 0.06 | - | - | 0.00 |
| IterateViaForLoop | 100 | 194.726 ns | 0.5682 ns | 0.5037 ns | 3.89 | 0.04 | - | - | 0.00 |
| ViaChunkEnumerator | 100 | 40.039 ns | 0.5103 ns | 0.4773 ns | 0.80 | 0.01 | - | - | 0.00 |
| | | | | | | | | | |
| EnumeratorViaToString | 1000 | 418.318 ns | 5.0539 ns | 4.7274 ns | 1.00 | 0.00 | 0.3223 | 2024 B | 1.00 |
| EnumeratorViaExtension | 1000 | 1,828.645 ns | 20.9355 ns | 19.5831 ns | 4.37 | 0.07 | - | - | 0.00 |
| IterateViaForLoop | 1000 | 1,820.386 ns | 4.1339 ns | 3.6646 ns | 4.35 | 0.05 | - | - | 0.00 |
| ViaChunkEnumerator | 1000 | 328.201 ns | 3.3440 ns | 3.1280 ns | 0.78 | 0.01 | - | - | 0.00 |
Pretty obvious the solution isn't great, but that is fine as it never was intended to be the high performer. The goal was to enumerate through an object, which didn't offer such a thing in the first place.
Conclusion
Extension methods are a beast in .NET. You can achieve a lot of things with them. Even enumerating through a StringBuilder
.