Many know that you can take ReadOnlySpan<char>
objects when dealing with string
s. They give you a direct way of operating on the underlying memory. Often times you can use them interchangeably, but there are scenarios where you really have to watch out what is going on.
This blog post will have a look at a major problem with ReadOnlySpan
when used like a "regular" string
.
ReadOnlySpan<char>
I discussed Span
and ReadOnlySpan
already a bit more in detail here: "Create a low allocation and faster StringBuilder - Span in Action". A Span
is just a representation of a contiguous slice of memory. It has a starting point (a pointer) and a length. That is basically all. Keep those two pieces of information in mind, we need them in a second again.
ReadOnlySpan
is the same as Span
but as the name suggests, you can't modify the state of the span object.
string
interning
As we know string
s are immutable. That is a specific design choice by the dotnet team. That means a once-created string can not be changed. If we use operations like Concat
a new object is created. But there is a good site to that decision: string interning.
In .NET, string interning is a way to optimize the usage of strings by storing a single instance of each unique string value in a table called the intern pool. This can be useful in situations where the same string value is used multiple times in a program, as it allows the program to reference a single copy of the string rather than creating a new instance of the string each time it is used.
Simply put:
var hello1 = "Hello";
var hello2 = "Hello";
Console.WriteLine(ReferenceEquals(hello1, hello2));
This will print true
to your console. Even though you created two different variables they both share literally the same address. They are one and the same object. And this behavior you will not find for example with integers or floats:
var a = 1;
var b = 1;
Console.WriteLine(ReferenceEquals(a, b));
This will print false
. Now everything is nice and easy until now. So let's make it a bit more complicated. We can create the "Hello" string also by using functions like string.Concat
or a StringBuilder
. So what is the output of the following code:
Console.Write(ReferenceEquals("Hello", string.Concat("He", "l", "lo")));
The result in both cases is "Hello" but here we get a false
. The reason is that .NET does not intern strings automatically when they are created during the runtime. Side info if you have code like that "He" + "l" + "lo"
during compile time, then the .NET compiler will replace it with "Hello"
.
ReadOnlySpan
comparison
Now to the funny bits. We start easy:
var helloWorld1 = "Hello World";
var helloWorld2 = "Hello World";
Console.WriteLine(helloWorld1 == helloWorld2);
Console.WriteLine(helloWorld1 == "Hello World");
As discussed early this one is straightforward, in both cases, we get true
. We can do the same if we convert them into Span
s:
Console.WriteLine(helloWorld1.AsSpan() == helloWorld2);
Console.WriteLine(helloWorld1.AsSpan() == helloWorld2.AsSpan());
Console.WriteLine(helloWorld1.AsSpan() == "Hello World");
All of them are true. So Span
behaves exactly like string
. But you guessed it, that will not stay for long like that. Let's check if we compare with a substring:
Console.WriteLine("Hello" == "Hello World"[..5]);
This one yields true
as well. Just for the sake of completeness the second expression "Hello World"[..5]
is just a fancy way of getting the substring from the index 0 with a length of 5. Let's try this out with all the different combination of Span
s:
Console.WriteLine("Hello".AsSpan() == "Hello World"[..5]);
Console.WriteLine("Hello".AsSpan() == "Hello World"[..5].AsSpan());
Console.WriteLine("Hello" == "Hello World"[..5].AsSpan());
They all yield false
. Huh? Why are they different than the string version? What string == string
does is not checking for references. The operator is overloaded and checks if the content is the same. ReadOnlySpan
does not do this. As I said earlier a Span
is defined via a starting point and a length. If they are the same, the Span
objects are the same. In our case the length might be the same, but the starting points are different. The two "Hello"s have different memory addresses and therefore our operator yields false
.
How to fix that?
If you want to compare the contents of a Span
use functions like SequenceEquals
:
Console.WriteLine("Hello".AsSpan().SequenceEqual("Hello World"[..5]));
This one will return `true.
Conclusion
Be aware that even though oftentimes ReadOnlySpan
and string
are almost one thing, they can behave differently in lots of aspects!