ReadOnlySpan<char> and strings - How not to compare them

12/29/2022
3 minute read

Many know that you can take ReadOnlySpan<char> objects when dealing with strings. They give you a direct way of operating on the underlying memory. Often times you can use them interchangeably, but there are scenarios where you really have to watch out what is going on.

This blog post will have a look at a major problem with ReadOnlySpan when used like a "regular" string.

ReadOnlySpan<char>

I discussed Span and ReadOnlySpan already a bit more in detail here: "Create a low allocation and faster StringBuilder - Span in Action". A Span is just a representation of a contiguous slice of memory. It has a starting point (a pointer) and a length. That is basically all. Keep those two pieces of information in mind, we need them in a second again.

ReadOnlySpan is the same as Span but as the name suggests, you can't modify the state of the span object.

string interning

As we know strings are immutable. That is a specific design choice by the dotnet team. That means a once-created string can not be changed. If we use operations like Concat a new object is created. But there is a good site to that decision: string interning.

In .NET, string interning is a way to optimize the usage of strings by storing a single instance of each unique string value in a table called the intern pool. This can be useful in situations where the same string value is used multiple times in a program, as it allows the program to reference a single copy of the string rather than creating a new instance of the string each time it is used.

Simply put:

var hello1 = "Hello";
var hello2 = "Hello";

Console.WriteLine(ReferenceEquals(hello1, hello2));

This will print true to your console. Even though you created two different variables they both share literally the same address. They are one and the same object. And this behavior you will not find for example with integers or floats:

var a = 1;
var b = 1;

Console.WriteLine(ReferenceEquals(a, b));

This will print false. Now everything is nice and easy until now. So let's make it a bit more complicated. We can create the "Hello" string also by using functions like string.Concat or a StringBuilder. So what is the output of the following code:

Console.Write(ReferenceEquals("Hello", string.Concat("He", "l", "lo")));

The result in both cases is "Hello" but here we get a false. The reason is that .NET does not intern strings automatically when they are created during the runtime. Side info if you have code like that "He" + "l" + "lo" during compile time, then the .NET compiler will replace it with "Hello".

ReadOnlySpan comparison

Now to the funny bits. We start easy:

var helloWorld1 = "Hello World";
var helloWorld2 = "Hello World";

Console.WriteLine(helloWorld1 == helloWorld2);
Console.WriteLine(helloWorld1 == "Hello World");

As discussed early this one is straightforward, in both cases, we get true. We can do the same if we convert them into Spans:

Console.WriteLine(helloWorld1.AsSpan() == helloWorld2);
Console.WriteLine(helloWorld1.AsSpan() == helloWorld2.AsSpan());
Console.WriteLine(helloWorld1.AsSpan() == "Hello World");

All of them are true. So Span behaves exactly like string. But you guessed it, that will not stay for long like that. Let's check if we compare with a substring:

Console.WriteLine("Hello" == "Hello World"[..5]);

This one yields true as well. Just for the sake of completeness the second expression "Hello World"[..5] is just a fancy way of getting the substring from the index 0 with a length of 5. Let's try this out with all the different combination of Spans:

Console.WriteLine("Hello".AsSpan() == "Hello World"[..5]);
Console.WriteLine("Hello".AsSpan() == "Hello World"[..5].AsSpan());
Console.WriteLine("Hello" == "Hello World"[..5].AsSpan());

They all yield false. Huh? Why are they different than the string version? What string == string does is not checking for references. The operator is overloaded and checks if the content is the same. ReadOnlySpan does not do this. As I said earlier a Span is defined via a starting point and a length. If they are the same, the Span objects are the same. In our case the length might be the same, but the starting points are different. The two "Hello"s have different memory addresses and therefore our operator yields false.

How to fix that?

If you want to compare the contents of a Span use functions like SequenceEquals:

Console.WriteLine("Hello".AsSpan().SequenceEqual("Hello World"[..5]));

This one will return `true.

Conclusion

Be aware that even though oftentimes ReadOnlySpan and string are almost one thing, they can behave differently in lots of aspects!

Span / Memory / ReadOnlySequence in C#

There are many different memory types used in modern C# programs. The more common ones are Span<T> and Memory<T>. Occasionally there is also ReadOnlySequence<T>. What do these types do?

StringBuilders magic for very large strings

The StringBuilder class is used to create mutable sequences of characters. Strings are immutable, so if you need to perform multiple operations on a string, it is better to use a StringBuilder instead of a string. This is especially useful when you need to concatenate a large number of strings. But there is more magic to it, especially when we go BIG!

Give your strings context with StringSyntaxAttribute

Strings are one of the most universal data types. We use them for URLs or regular expressions or even to define some date. With .NET 7 we have a new way of giving those strings a bit of meaning. Meet StringSyntaxAttribute.

I also show you a way how to use them in .NET 6 and earlier.

An error has occurred. This application may no longer respond until reloaded. Reload x