A few days ago, I stumbled on an article by Amit Raz about the SortedList<K,T> on Dev102.com. In the article, which compares and contrasts the SortedList collection class in the .NET BCL to the SortedDictionary class, Amit concludes with, "So what is the SortedList good for? Beats me. I deem it useless." His conclusion seemed to be predicated on the fact that the Add() method would throw an exception if the programmer attempted to insert a duplicately keyed entry into a SortedList. However, this is documented behavior. And both the SortedList and the SortedDictionary exhibit that same behavior. An indexer exists on each of those collections that will allow the insertion of a duplicately keyed item. For both classes, duplicates replace the original values when using the indexer.
I had a difficult time following Amit's logic but he's a bright fellow so I wanted to find out if the SortedList really was useless as he had proclaimed. On this page, Microsoft provides a table that describes the benefits and relative drawbacks of these two seemingly similar classes. Long story short, the key advantages of the SortedList<K,T> are that it uses less memory than the SortedDictionary<K,T> and that some of its members that return keys and values are indexed. OK, so the SortedList uses less memory. But what's that latter claim about?
Well, the SortedList<K,T> has a few members that the SortedDictionary<K,T> does not. Among them are:
- int IndexOfKey( K key )
- int IndexOfValue( T value )
Kicking around in this class in Reflector, one sees that the internal storage for the SortedList<K,T> is a pair of arrays: one for the keys which is kept in sorted order and one for the values which is kept in insertion order. When using the IndexOfKey method noted above, Array.BinarySearch() is used to perform an efficient search for the desired key. However, the IndexOfValue method uses a brute force (O(n)) scan to find the requested value. Another special case advantage of the SortedList is that insertions are O(1) for data inserted in sorted order whereas for the SortedDictionary, the average insertion cost is around O(log n). In fact, for data already in sorted order, the SortedDictionary pays a bit of a penalty on insertion because of the required balancing of the tree structure used to store the information. More on that later. So, if you have small lists and the keys are already sorted before insertion, the SortedList might be a good choice. If your keys are not sorted, insertion into a SortedList could be as bad as O(n). So it may be that you have to know something about your data to use the SortedList in a way that makes it worthwhile. Not understanding your data, could make the SortedList worthless, as Amit claims.
I wrote a small test harness that exercises the SortedList<K,T> and SortedDictionary<K,T>. I've included a link to the source code below. The application runs a battery of tests using a small list of 1,000 items on each type of dictionary and the same battery using a large list of 40,000 items. The main window looks like this:
These results show an average test run on my Windows Server 2008 machine using the .NET Framework 3.5. For the small list of 1,000 items, the SortedList seems to be a bit more efficient than the SortedDictionary with respect to time, despite the fact that the keys are not in sorted order for that test. However, when it comes to the larger list of 40,000 items, the SortedDictionary is the clear winner for both insertions and removals. But what about memory? Remember that Microsoft's MSDN topic said that the SortedList can be more memory efficient? I put 10 seconds of time in between the 4 test groups shown on the screen shot above and ran the .NET memory profiler to see what was happening. It's not all that conclusive, in my opinion. Here's a graphic showing the Gen 0, 1 and 2 heap sizes over the lifetime of the test. Perhaps you can help me analyze what you see:
There is a very large spike during the third test in the Gen 1 heap size when the large data set of 40,000 values in placed into the SortedList. Most of that newly allocated memory seems to move to the Gen 2 heap when the last test kicks off, returning the Gen 1 heap almost back to it's former value. The internal implementation of the SortedDictionary is based on a private, internal class in System.Collections.Generic called TreeSet<T>. In an upcoming blog post, I will be examining the TreeSet<T> class in detail. It uses a special kind of binary tree implementation known as a red-black tree. Why Microsoft didn't expose this incredibly cool class, I don't understand. So I suppose I should do that, right? Until next time...
Source Code for the DictionaryTestHarness Application (6KB)