Making SQL and .NET SHA1 Hashes Match

by kevin 8/7/2008 1:37:00 PM

A friend at SnagAJob.com came to me with an interesting problem today. He said that the HashBytes function in SQL Server was outputting different results from the HashAlgorithm.ComputeHash method in .NET. Here's a T-SQL script that hashes the URL to my blog.

DECLARE @data NVARCHAR(max)
SET @data = N'http://www.gotnet.biz/Blog'
SELECT HashBytes('SHA1', @data)

This script outputs 0x7FC8C5E43E9425C890AB96E660C86FC9CB077F4D as the hash value. The algorithm in C# attempting to do the same thing might look like this:

using System;
using System.Security.Cryptography;
using System.Text;

public class HashTest
{
    static void Main()
    {
        DoHash(new SHA1CryptoServiceProvider());
        Console.ReadLine();
    }

    private static void DoHash(HashAlgorithm algo)
    {
        var bytes = Encoding.UTF8.GetBytes(
            "http://www.gotnet.biz/Blog");
        var hash = algo.ComputeHash(bytes);
        Console.Write("{0} ", algo.GetType().Name);
        foreach (var b in hash)
            Console.Write("{0:X2}", b);
        Console.WriteLine();
    }
}

This code outputs 0x10397796345455fa6332db477972dc360b54ef2, a different hash value. Do you see the problem in the code? I didn't at first but it's simpler than you think.

The encoding that I used in the C# code is UTF8 which means the 8-bit Universal Character Set/UNICODE Transformation Format. That's a mouthful, isn't it? In .NET, the UTF8 encoding corresponds to Windows code page 65001 where each source character may map to between one and four characters in the encoded output. I used that encoding implicitly because in working with XML as often as I do, I'm accustomed to using the UTF8 encoding for nearly everything I do. My friend who posed the original question had done the same thing. However, in this case, it's a bad choice.

Looking at the T-SQL code above, notice that the data type for my string is NVARCHAR, that's UNICODE. And although all strings in .NET are stored in UNICODE and the UTF8 encoding is, as its name implies, just transforming the UNICODE to an 8-bit transportable format, the computed SHA1 hash on a UTF-8 encoded string in .NET is clearly not the same as SQL Server's result.

Playing around with some other transforms in the System.Text namespace, I discovered that by replacing the UTF8 encoding with the so-called Unicode encoding (or by switching the SQL data type to VARCHAR) makes the hash computations match between SQL and .NET in my example above. I capitalized Unicode as I did there quite deliberately because I am referring to the type in the System.Text namespace called UnicodeEncoding (which is available as the static Unicode property on the Encoding class) not the UNICODE standard.

In .NET, the Unicode encoding corresponds to Windows code page 1200 and goes by the familiar alias UTF-16. As that alias may imply, the.NET UnicodeEncoding uses a sequence of one or two 16-bit integers to represent each character in the original text. The results are easy to understand visually so I made the graphic shown here.

You can see that the contents of the byte stream from the two encodings is different. The UTF8 encoding strips the high order zero bytes for cultures where they are superfluous whereas the Unicode encoding preserves them. To sum up, when hashing NVARCHARs in SQL, the equivalent encoding to use in .NET code is the UnicodeEncoding. When hashing VARCHARs in SQL server, the matching .NET encoding is the UTF8Encoding.

 

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , ,

C# | CTS | Security | Software Development | SQL Server | SQL Server 2008

The System.DateTimeOffset Type

by kevin 2/21/2008 10:29:00 PM
I've just been experimenting with the new DateTimeOffset type in .NET 3.5. It's about time. I've been evangelizing for the use of UTC for storage of times since it was called GMT. And it seems that SQL 2008 has a new DATETIMEOFFSET type, too. Now, instead of storing UTC offset as a separate attribute in every table that contains a DATETIME value, I can store the original date and it's original offset at the time of storage in one column. That will be nice. It remains to be seen how this will affect my T-SQL practices and the various ORM technologies I support. It seems that LINQ to SQL will be getting support for DateTimeOffset and related classes, too. I am somewhat disappointed that the new DATE and TIME types from SQL 2008 aren't available natively in .NET. There are good arguments for being able to store date and time references separately. Anyway, one step at a time, right?

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , ,

LInQ | SQL Server 2008 | CTS | BCL

Powered by BlogEngine.NET 1.3.1.0
Theme by Mads Kristensen


Kevin's on Twitter / FriendFeed

W. Kevin Hazzard Welcome to Kevin Hazzard's Blog. Kevin is a Software Architect, Professor and Microsoft MVP specializing in C#, WCF, Silverlight and IronPython.

View Kevin Hazzard's profile on LinkedIn
Microsoft MVP Award Foolish robot!

Calendar

<<  October 2008  >>
MoTuWeThFrSaSu
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789

View posts in large calendar

Recent comments

Authors

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Sign in