Dictionary Classes Benchmarked

by kevin 7/13/2008 8:30:00 PM

A few days ago, I stumbled on an article by Amit Raz about the SortedList<K,T> on Dev102.com. In the article, which compares and contrasts the SortedList collection class in the .NET BCL to the SortedDictionary class, Amit concludes with, "So what is the SortedList good for? Beats me. I deem it useless." His conclusion seemed to be predicated on the fact that the Add() method would throw an exception if the programmer attempted to insert a duplicately keyed entry into a SortedList. However, this is documented behavior. And both the SortedList and the SortedDictionary exhibit that same behavior. An indexer exists on each of those collections that will allow the insertion of a duplicately keyed item. For both classes, duplicates replace the original values when using the indexer.

I had a difficult time following Amit's logic but he's a bright fellow so I wanted to find out if the SortedList really was useless as he had proclaimed. On this page, Microsoft provides a table that describes the benefits and relative drawbacks of these two seemingly similar classes. Long story short, the key advantages of the SortedList<K,T> are that it uses less memory than the SortedDictionary<K,T> and that some of its members that return keys and values are indexed. OK, so the SortedList uses less memory. But what's that latter claim about?

Well, the SortedList<K,T> has a few members that the SortedDictionary<K,T> does not. Among them are:

  • int IndexOfKey( K key )
  • int IndexOfValue( T value )

Kicking around in this class in Reflector, one sees that the internal storage for the SortedList<K,T> is a pair of arrays: one for the keys which is kept in sorted order and one for the values which is kept in insertion order. When using the IndexOfKey method noted above, Array.BinarySearch() is used to perform an efficient search for the desired key. However, the IndexOfValue method uses a brute force (O(n)) scan to find the requested value. Another special case advantage of the SortedList is that insertions are O(1) for data inserted in sorted order whereas for the SortedDictionary, the average insertion cost is around O(log n). In fact, for data already in sorted order, the SortedDictionary pays a bit of a penalty on insertion because of the required balancing of the tree structure used to store the information. More on that later. So, if you have small lists and the keys are already sorted before insertion, the SortedList might be a good choice. If your keys are not sorted, insertion into a SortedList could be as bad as O(n). So it may be that you have to know something about your data to use the SortedList in a way that makes it worthwhile. Not understanding your data, could make the SortedList worthless, as Amit claims.

I wrote a small test harness that exercises the SortedList<K,T> and SortedDictionary<K,T>. I've included a link to the source code below. The application runs a battery of tests using a small list of 1,000 items on each type of dictionary and the same battery using a large list of 40,000 items. The main window looks like this:

These results show an average test run on my Windows Server 2008 machine using the .NET Framework 3.5. For the small list of 1,000 items, the SortedList seems to be a bit more efficient than the SortedDictionary with respect to time, despite the fact that the keys are not in sorted order for that test. However, when it comes to the larger list of 40,000 items, the SortedDictionary is the clear winner for both insertions and removals. But what about memory? Remember that Microsoft's MSDN topic said that the SortedList can be more memory efficient? I put 10 seconds of time in between the 4 test groups shown on the screen shot above and ran the .NET memory profiler to see what was happening. It's not all that conclusive, in my opinion. Here's a graphic showing the Gen 0, 1 and 2 heap sizes over the lifetime of the test. Perhaps you can help me analyze what you see:

There is a very large spike during the third test in the Gen 1 heap size when the large data set of 40,000 values in placed into the SortedList. Most of that newly allocated memory seems to move to the Gen 2 heap when the last test kicks off, returning the Gen 1 heap almost back to it's former value. The internal implementation of the SortedDictionary is based on a private, internal class in System.Collections.Generic called TreeSet<T>. In an upcoming blog post, I will be examining the TreeSet<T> class in detail. It uses a special kind of binary tree implementation known as a red-black tree. Why Microsoft didn't expose this incredibly cool class, I don't understand. So I suppose I should do that, right? Until next time...

Source Code for the DictionaryTestHarness Application (6KB)

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: ,

BCL | C# | Debugging | Software Development

Profiling LINQ Queries Presentation

by kevin 5/19/2008 10:45:00 PM

I did a presentation called "Profiling and Tuning LINQ Queries" for the Richmond Sharepoint User Group on May 8th. I had gotten so busy, I forgot to post the slides and sample code. Here it is.

Profiling and Tuning LINQ Queries.pptx (75.58 kb) - these are the PowerPoint 2007 slides

ProfileLINQ20080508.zip (27.96 kb) - this C# project builds a dynamic query against the Northwind database using LINQ to SQL, allowing the use of the SQL Profiler to be used to see the increasing complexity of the queries as criteria is added and data is shaped using increasingly complex anonymous types.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , ,

Debugging | Richmond | SQL Server | User Group

Strange Behavior for Silverlight's HtmlPage.Window.Navigate Method

by kevin 4/21/2008 2:41:00 PM

I found some odd behavior today with respect to the way that Silverlight's HtmlPage.Window.Navigate method works. If a page is cacheable by the client, calling HtmlPage.Window.Navigate will always choose to load the cached version of the page. With IE7, it does this without attempting a cache update check. Normally, IE7 will emit a cache update check for cached content, expecting an HTTP 304 response code, meaning that the cached version is up-to-date. However, when HtmlPage.Window.Navigate references a cached resource, it honors the cached content without checking for an update. This is how Firefox normally behaves and it threw me for a loop as a result. Check out the following code.

// this one will used a cached version always
HtmlPage.Window.Navigate(new Uri("http://server/targetpage.aspx"));
// this one will ignore the cache
HtmlPage.Window.Eval("window.location.href='http://server/targetpage.aspx'");

In that code, the first method call will cause the browser to use a cached page. This is true even if the cache-control was set to private, which is the default. It also does not check for a cache update as IE7 normally does. The second call injects JavaScript into the hosting page for execution. That one seems to force the browser to ignore the cache where the target page was received with the private cache-control policy. I don't know if this is a bug but I know it's not documented that way. I should mention that this happens on Silverlight 2 Beta 1.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

C# | Debugging | Silverlight

Where have the debuggers gone?

by kevin 4/7/2008 8:39:00 AM

In the TDD frenzy and hype these days, it seems that many developers have forgotten the importance of building good debugging skills. In fact, I know one fellow architect (whom I won't mention by name) who brags about not having used a debugger in years. He claims that his code is so well tested that he doesn't need a debugger. Funny how, when I found a bug in his code recently, I isolated and fixed it using a good, old-fashioned debugger.

Don't get me wrong. I think TDD is great. But there's no substitute for stepping through your code and watching it work. It slows the process down to do it but that's true of TDD in general. I am thinking about hosting a debugging workshop in the Richmond area soon. If you're interested, drop a note to kevin at gotnet dot biz and let me know.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: ,

Richmond | Software Development | Debugging | TDD

Powered by BlogEngine.NET 1.3.1.0
Theme by Mads Kristensen


Kevin's on Twitter / FriendFeed

W. Kevin Hazzard Welcome to Kevin Hazzard's Blog. Kevin is a Software Architect, Professor and Microsoft MVP specializing in C#, WCF, Silverlight and IronPython.

View Kevin Hazzard's profile on LinkedIn
Microsoft MVP Award Foolish robot!

Calendar

<<  October 2008  >>
MoTuWeThFrSaSu
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789

View posts in large calendar

Recent comments

Authors

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Sign in