Array, List, Collection, Set, ReadOnlyList - what? A comprehensive and exhaustive list of collection-like types

12/15/2022
14 minute read

.NET knows a big list of collection-like types like: IEnumerable, IQueryable, IList, ICollection, Array, ISet, ImmutableArray, ReadOnlyCollection, ReadOnlyList, and many more.

This blog post will give you an exhaustive list of types in .NET and when to use what.

Interfaces vs Implementation

In this article as well as in the real world you often see a concrete implementation as well as the interface type (List<T> vs IList<T>). I will discuss this is also a bit more in detail later, but a general world before. Interfaces are contracts and implementations are details. So as a rule of thumb: If you have an API, which is public-facing (normally interfaces or methods that are either public or protected) the general advice is to use the interface type rather than the implementation. Only use concrete implementations if you really have to. List<T> is not the same as IList<T>. At first sight, it seems like this (mainly because List<T> implements IList<T>) but there are differences. Don't constrict your users artificially.

Interfaces define just an operational contract, which in the majority of cases is good enough. They abstract away certain details for you (which is a very good thing!).

IEnumerable

IEnumerable is an interface that defines a method for retrieving elements from a collection one at a time. It is often used as the return type of methods that returns a sequence of elements. This interface allows a collection to be used with the foreach loop and other methods that expect a sequence of elements. It is also the base type for almost all LINQ operations (Count(), Where(), Select(), and friends).

The nature of IEnumerable is to tell the user that we can enumerate an object. It does that by moving to the next item one-by-one until we are at the end of our enumeration. IEnumerable is lazily evaluated.

Lazy evaluation is a technique for delaying the computation of a value until it is actually needed. In the context of IEnumerable, it means that the elements of the sequence are not computed until they are actually accessed by the code that is using the sequence. This can be useful for optimizing performance, particularly when working with large sequences because it allows you to avoid computing elements of the sequence that you never end up using.

Here is a small example where we generate an exhaustive list of prime numbers until int.MaxValue. But if the user only wants 100 instead of all numbers then we don't continue after the 100th prime number.

var first100Primes = GetPrimes().Take(100);

IEnumerable<int> GetPrimes()
{
    int i = 2;
    while (i < int.MaxValue)
    {
        if (IsPrime(i))
        {
            yield return i;
        }

        i++;
    }
}

When to use?

I will keep it very abstract and generic at the beginning and will explain it later on. I want to quote Vladimir Khorikov here:

Return the most specific type, accept the most generic type

If you just want to enumerate things in a foreach loop and you might even abort early, IEnumerable is your candidate of choice. We can also put it the other way around: If you don't need any of the other choices I will present later on, use IEnumerable.

IQueryable

I will keep this section short, as I already covered that in greater detail in "IEnumerable vs IQueryable - What's the difference". IQueryable extends IEnumerable and is often times used with objects or collections that are not held in memory. The most prominent example is Entity Framework where the DbContext (or better the DbSet) offers you an IQueryable object to gather your data from the underlying storage provider. The same exists also for Linq to XML and friends.

As the name suggests IQuerayable offers methods for expressing queries against a collection of elements. It behaves similarly to IEnumerable in that sense, that it is also lazy evaluated. If you want to know the exact differences, please have a look at the linked article above.

When to use?

You want to use the IQueryable interface if you want to work with a sequence of data sources that are out of your memory location. In contrast, you would use IEnumerable if your data source is in memory (RAM).

ICollection

Whereas IEnumerable and IQueryable were lazy evaluated ICollection is not. So now we are in the realm of your collection is already materialized in some way or another. ICollection adds a few functionalities to the enumeration: Add, Remove, and Clear (as well as some others). The core idea is now that we can mutate the collection we have. That was not possible with the other two interfaces I showed earlier. ICollection also inherits from IEnumerable. So you can see that they built on top of each other. So whenever you have a type of ICollection you can also use it for IEnumerable. That is why LINQ works basically on all collections because almost all of them inherit from IEnumerable.

When to use?

Use ICollection when you need an already materialized object (for example as a result of a LINQ query) or you want to mutate the collection itself. So you want to add or remove certain entries.

IList

IList inherits from ICollection so whatever you can do with ICollection you can do with IList as well. So what is the difference then? What does IList bring more to the table? And the simple answer is, it has an indexer. So we have some kind of well-defined and fixed order in our enumeration.

List<int> list = new List<int> {1, 2, 3, 4 };
Console.WriteLine(list[1]); // Prints 2

When to use?

So the use case here is clear. If you want to get an element via index as well as removing or adding items, IList might be your candidate.

IReadOnlyCollection and IReadOnlyList

This brings me to a special place: The read-only collection types. There are not really different from their siblings. As the name suggests you can only read items but you are not allowed to add, remove or delete something from the collection. Doesn't IReadOnlyCollection sound like your regular array? Well partially, but there are differences. First arrays are very special (more to that later). But you can model things with IReadOnlyCollection or IReadOnlyList that you can't with an array. I discussed this in "ReadOnlyCollection is not an immutable collection". A ReadOnlyCollection is like a VIEW in SQL. If the originated collection updates, so will your read-only one:

var numbers = new List<int> { 1, 2 };
var readOnlyNumbersViaExtension = numbers.AsReadOnly();
var readOnlyNumbers = new ReadOnlyCollection<int>(numbers);

numbers.Add(3);

// All of them will print 3
Console.WriteLine($"List count: {numbers.Count}");
Console.WriteLine($"ReadOnlyCollection via extension count: {readOnlyNumbersViaExtension.Count}");
Console.WriteLine($"ReadOnlyCollection via new count: {readOnlyNumbers.Count}");

When to use?

Almost every time you have a materialized collection where you don't want to mutate the state of the collection (Add, Remove, Clear) you want to consider these types. They also offer Contains and other helper functions. If you need the indexer take IReadOnlyList instead of IReadOnlyCollection).

ISet

A set is a collection of unique elements. ISet defines handy methods to interact with such objects. If you have an object of type ISet you know that there are no duplicates inside (at least the implementation of that interface should guarantee that). The interface provides methods that are commonly known in set theory like creating a union or an intersection of methods.

// Create two sets
HashSet<int> set1 = new HashSet<int> { 1, 2, 3 };
HashSet<int> set2 = new HashSet<int> { 2, 3, 4 };

// Get the union of the two sets
HashSet<int> union = new HashSet<int>(set1);

// The union set contains 1, 2, 3, and 4
union.UnionWith(set2);

When to use?

You might use ISet in your code when you want to store a collection of items and ensure that there are no duplicates. Like the given operation (UnionWith) and others, they can outperform "regular" collections as they are specialized in those operations. On the negative side, creating a set is normally much more expensive than your regular list.

IImmutableList and its implementations

Now we are in the realm of immutable objects. I will greatly simplify here and I will also put all the interfaces and implementations into one big bucket. Why? Because it is unlikely that you will use them very often and if so, you should invest time into the specifics of each collection depending on your concrete use case.

We saw earlier that read-only collections are just a wrapper around a collection, that gets changed when the underlying collection changes. But sometimes you don't want this at all. For example, you want to enumerate through your IReadOnlyCollection while the underlying collection changes. Well you get greeted by an InvalidOperationException. This might happen if you do multithreading.

So immutable collection would create a new and completely disconnected collection that is based on the origin at the exact point it was created and it will never ever update again. That is perfect for thread safety. Still there are ways to create or remove items from that immutable collection, but this will result in a new object rather than changing the original object - hence the name immutable.

The good thing is that those collections are specialized. For example, if you add an item to an ImmutableList, it will doesn't allocate a completely new allocation but rather share part of the "old" one. That is possible because we know that the array of the original immutable list can't change any more.

When to use?

Use those collections and interfaces when you want to true immutability. This is oftentimes handy when handling with multiple threads at a time.

FrozenSet

A special form of that immutable structure is coming with .NET 8: The FrozenSet and the FrozenDictionary. I'll quote here one person of the .NET team (@geeknoid) which commented on my post ("Frozen collections in .NET 8")regarding said topic:

ImmutableSet and ImmutableDictionary are designed to be immutable and to make it easy to create slightly modified copies of a given instance at a reasonable cost. So imagine you create an immutable set with a million entries in it. You cannot mutate this set since it is immutable. Now, you need a new set with 1 extra entry. You could recreate the set from the ground up, which would be very expensive. Instead, you can start from your original immutable set and apply a delta to create a new distinct set instance which under the covers shares most of the state from the original set. You use less memory this way, and it takes much less time to create the new instance.

The problem is that in order to allow this mode of operation, ImmutableDictionary and ImmutableSet are complex implementations which introduce substantial compromises in overall read performance as a trade-off for this ability to make cheap delta clones.

FrozenSet and FrozenDictionary do not provide the delta clone ability, they are optimized strictly for fast read performance. You pay more for creation, you pay more for making a clone with modifications, as a trade-off for getting faster steady state read performance.

When to use?

The new behavior would indicate that you use such a structure where you create a collection at the beginning of your application life time and never update the contents. As thos structures are optimize for look up, they come in handy if you often have to do this. Kind of trade-off between a bit more startup time vs actual faster run time in your method.

Array

I talked earlier about array's and they seem to have an overlap with IReadOnyList as they are both read-only and have an indexer. From my personal experience if in doubt between those two take IReadOnlyList and friends. The API is more friendly in my opinion. An array is a fixed-size and contiguous block of memory. So it guarantees O(1) access time, but so does a List<T> as well (now that we are in the implementation realm, that is a fair comparison).

When to use?

There are 3 primary use-cases in my opinion.

  1. Performance. If you are on a very low level where every allocation and nano second count, you might consider an array. They are faster for sure, but with the trade-off of being less usable than the other types.

  2. Multidimensional arrays. If you need a multidimensional array, often times you don't have any other choice. You could work with List<List<T>> but that seems rather ugly.

int[][] jaggedArray = { new int[] {1,2,3,4},
                        new int[] {5,6,7},
                        new int[] {8},
                        new int[] {9}
                      };

int[,] multiDimArray = {{1,2,3,4},
                          {5,6,7,0},
                          {8,0,0,0},
                          {9,0,0,0}
                        };
  1. params keyword. If you want to have a variable amount of arguments like string.Format you have to use an array.
Show(2);
Show(2, 3);
Show(new[] {1, 2, 3});

public void Show(params int[] val)
{  
    for (int i=0; i<val.Length; i++)  
    {  
        Console.WriteLine(val[i]);  
    }  
}  

List

List is your allrounder when it comes down to API friendliness and ease of use. It has a lot of useful operations, which don't exist on IList, for example, BinarySearch or Sort. Furthermore List gives you a guarantee that the indexer call (myList[0]) is O(1). That guarantee does not exist on IList. We will see later where this comes into play. The reason it can guarantee that is that the underlying storage is a normal one-dimensional array.

When to use

For internal use cases (private, internal modifier) this is your 80% case. List has a nice balance between performance and useability. It is a safe bet as long as other types I will showcase later will not fit better.

Collection

This one is a bit special. Collection is not really used directly as a return value or as an argument. To understand why this type exists, let's have a look at List once more. If you want to derive from List and try to extend Add or Remove you are out of luck. Those methods are not virtual and therefore you can't easily extend the behavior. Sure you could use the new keyword, but that falls short in such circumstances:

public class MyList<T> : List<T>
{
    public new void Add() ...
}

List<T> list = new MyList<T>();
list.Add(); // This will call List<T>.Add and not MyList<T>.Add

And exactly here comes Collection into play. It has virtual methods that you can override easily,

When to use?

If you want to have an already good enough implementation of a collection you want to build on top off.

LinkedList

Even though the name suggests it implements IList it does not. The LinkedList does not offer you an indexer like List does. The reason is, that in a LinkedList accessing a member isn't O(1). You have to go from node to node until you found your index. So if you see yourself using often times the indexer, LinkedList would not be a viable candidate. The power of LinkedLists are manifold:

  • As it is not a contiguous block of memory removing an item is fairly cheap (you don't have to move elements around after the position where you deleted something). The same applies to random insertions.
  • Also if a List is very big (>85kb) it goes onto the large object heap - LinkedList wouldn't (here some explanation - basically a LinkedList consists out of multiple elements / nodes instead of one large array)
  • They can help with fragmentation

When to use?

See the advantages above. If they outplay your everyday List then LinkedList can be a candidate. As you see a bit special.

ObservableCollection

ObservableCollections have one extra purpose: You can observe them as the name suggests. They offer event's which get triggered when someone adds, removes or completely clears the collection.

var numbers = new ObservableCollection<int>() { 1, 2, 3, 4, 5 };
numbers.CollectionChanged => Console.WriteLine("Something happened");

numbers.Add(2); // Will trigger the CollectionChanged event

When to use?

The use case is pretty clear I guess. You want to observe and react to changes in the collection. If you ever used WPF you know what I am talking about.

HashSet

Since HashSet only has unique elements, its internal structure is optimized for faster searches. It also doesn't make sense to use foreach over an HashSet even though that is valid syntax. A set is defined by having no order, using foreach imposes order to some extent.

When to use?

As said above. If you have an internal API that greatly benefits from that data structure, you can use that. To some extent, the same applies to ISet.

SortedSet

A SortedSet is also an ISet with the difference of having a specified order. It kind of behaves like a List without having duplicates and without the ability to access a random index via mySet[1]. But it provides you functions like Min and Max that have a big O notation of log(n).

When to use?

Never ever came across that in my career 😄.

ConcurrentBag

The ConcurrentBag as well as the other concurrent types are used in a multi-threaded scenario where multiple threads might access or read from the type. A bag is a collection that can have duplicated items (in contrast to a set for example) but the order is not defined. You might wonder why is not a ConcurrentList? And the answer is simple: A list has an order. What is the order if two or more threads simultaneously add an item to that list? Therefore we have a bag.

When to use?

You are in a scenario where multiple threads can read or write to the collection and you need a thread-safe way to handle that.

Stack

A stack gives you an easy way of modeling "Last in First out" behavior. So the last element you put into the Stack is the first one coming out.

var stack = new Stack<int>();
stack.Push(2);
stack.Push(3);

Console.WriteLine(stack.Pop()); // Prints 3

When to use?

Every time you need this "Last in First out" behavior.

## Queue Like a real queue in the real world, the first thing put into the queue is also the first thing coming out again.

var queue = new Queue<int>();
queue.Enqueue(2);
queue.Enqueue(3);

Console.WriteLine(queue.Dequeue()); // Prints 2

When to use?

Every time you need this "First in First out" behavior.

Span and ReadOnlySpan

Now we are in a very special place here. Span and ReadOnlySpan are not really collections (like an array). They just represent a contiguous block of memory that is not managed by the Garbage Collector. It offers functions so that you can enumerate through them. I have an article that goes deeper into that topic: "Create a low allocation and faster StringBuilder - Span in Action".

Here is also a big difference between let's say List and IList. I said earlier that List gives you the guarantee that you have a contiguous block of memory, so you can easily create an object which spans that memory block. The .NET Framework has helper methods for that:

CollectionsMarshal.AsSpan(myList);

CollectionsMarshal.AsSpan() does take only List as argument and not IList.

When to use?

Every time you are in a high-performance scenario (paired with the least amount of allocations possible) Span and ReadOnlySpan are your friends.

Conclusion

There are lots and lots of interfaces and implementations in .NET. My general advise is to take the appropriate interface for public-facing API for sure. For internal and private APIs it is a bit up to you. In the end, you want to make your intent clear. Try to use interfaces as much as possible and only rely on concrete implementations if you have to.

Enabling List<T> to store large amounts of elements

List<T> is one of the most versatile collection types in .NET. As it is meant for general-purpose use, it is not optimized for any specific use case. So, if we look closely enough, we will find scenarios where it falls short. One of these scenarios is when you have lots of data. This article will look at precisely this.

Performance (ReadOnly)List vs Immutable collection types

A bit back on LinkedIn, there was a discussion about read-only collection and immutability where this is not the point I want to discuss now, as I already covered that here: "ReadOnlyCollection is not an immutable collection".

This post is just about the performance of those types compared to our baseline, the good old List<T>. It also explains why we see the results we see.

How does a List know that you changed it while enumerating it?

Everyone falls for that and tries to change a list while enumerating it greeted by the System.InvalidOperationException: Collection was modified; enumeration operation may not execute. message. But how does the List know that you changed it? Let's find out.

An error has occurred. This application may no longer respond until reloaded. Reload x