Making SQL and .NET SHA1 Hashes Match

by kevin 8/7/2008 1:37:00 PM

A friend at SnagAJob.com came to me with an interesting problem today. He said that the HashBytes function in SQL Server was outputting different results from the HashAlgorithm.ComputeHash method in .NET. Here's a T-SQL script that hashes the URL to my blog.

DECLARE @data NVARCHAR(max)
SET @data = N'http://www.gotnet.biz/Blog'
SELECT HashBytes('SHA1', @data)

This script outputs 0x7FC8C5E43E9425C890AB96E660C86FC9CB077F4D as the hash value. The algorithm in C# attempting to do the same thing might look like this:

using System;
using System.Security.Cryptography;
using System.Text;

public class HashTest
{
    static void Main()
    {
        DoHash(new SHA1CryptoServiceProvider());
        Console.ReadLine();
    }

    private static void DoHash(HashAlgorithm algo)
    {
        var bytes = Encoding.UTF8.GetBytes(
            "http://www.gotnet.biz/Blog");
        var hash = algo.ComputeHash(bytes);
        Console.Write("{0} ", algo.GetType().Name);
        foreach (var b in hash)
            Console.Write("{0:X2}", b);
        Console.WriteLine();
    }
}

This code outputs 0x10397796345455fa6332db477972dc360b54ef2, a different hash value. Do you see the problem in the code? I didn't at first but it's simpler than you think.

The encoding that I used in the C# code is UTF8 which means the 8-bit Universal Character Set/UNICODE Transformation Format. That's a mouthful, isn't it? In .NET, the UTF8 encoding corresponds to Windows code page 65001 where each source character may map to between one and four characters in the encoded output. I used that encoding implicitly because in working with XML as often as I do, I'm accustomed to using the UTF8 encoding for nearly everything I do. My friend who posed the original question had done the same thing. However, in this case, it's a bad choice.

Looking at the T-SQL code above, notice that the data type for my string is NVARCHAR, that's UNICODE. And although all strings in .NET are stored in UNICODE and the UTF8 encoding is, as its name implies, just transforming the UNICODE to an 8-bit transportable format, the computed SHA1 hash on a UTF-8 encoded string in .NET is clearly not the same as SQL Server's result.

Playing around with some other transforms in the System.Text namespace, I discovered that by replacing the UTF8 encoding with the so-called Unicode encoding (or by switching the SQL data type to VARCHAR) makes the hash computations match between SQL and .NET in my example above. I capitalized Unicode as I did there quite deliberately because I am referring to the type in the System.Text namespace called UnicodeEncoding (which is available as the static Unicode property on the Encoding class) not the UNICODE standard.

In .NET, the Unicode encoding corresponds to Windows code page 1200 and goes by the familiar alias UTF-16. As that alias may imply, the.NET UnicodeEncoding uses a sequence of one or two 16-bit integers to represent each character in the original text. The results are easy to understand visually so I made the graphic shown here.

You can see that the contents of the byte stream from the two encodings is different. The UTF8 encoding strips the high order zero bytes for cultures where they are superfluous whereas the Unicode encoding preserves them. To sum up, when hashing NVARCHARs in SQL, the equivalent encoding to use in .NET code is the UnicodeEncoding. When hashing VARCHARs in SQL server, the matching .NET encoding is the UTF8Encoding.

 

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , ,

C# | CTS | Security | Software Development | SQL Server | SQL Server 2008

Efficient Paging in SQL Server via LINQ

by kevin 7/6/2008 12:00:00 PM

UPDATE: I've included a videocast with this blog post. Let me know what you think. 

A few days ago, my buddy Justin Etheredge wrote a blog post about Efficient Paging in SQL Server. I was thinking about how transparent Language Integrated Query (LINQ) makes paging and I thought I'd blog about it. Two of the more interesting extension methods offered by LINQ are Skip() and Take(). You can use these extension methods to skip rows at the beginning of the query result and take only those you want to return. Sounds like paging to me. I wonder if Skip() and Take() used in combination with LINQ to SQL behave as efficiently as Justin's example? Let's take a look. Consider the following LINQ query:

var db = new AdventureWorksDataContext();
var query = from p in db.SalesOrderHeaders
  where p.SalesTerritory.Name.Equals( "Northeast" )
  select new {
    p.Contact.FirstName,
    p.Contact.LastName,
    TotalSales = p.SalesOrderDetails.Sum(
      o => o.OrderQty * o.UnitPrice )
};

This small example uses the AdventureWorks SalesOrderHeaders as the input sequence and shapes the output sequence to include the associated Contact's name parts and the total value of each order. The total value is computed as the OrderQty times the UnitPrice for each associated item in the SalesOrderDetails table. There is a filter placed on the query to restrict the results to orders placed in the 'Northeast' territory. This simple query shows how easy it is to filter, perform arithmetic and use the relationships in a LINQ to SQL data context to traverse table relationships. What does this query look like when it's compiled for execution on SQL Server?

SELECT
  [t2].[FirstName],
  [t2].[LastName],
  (
    SELECT SUM([t4].[value])
    FROM
    (
        SELECT
          (CONVERT(Decimal(29,4),[t3].[OrderQty])) * [t3].[UnitPrice] AS [value],
          [t3].[SalesOrderID]
        FROM [Sales].[SalesOrderDetail] AS [t3]
    ) AS [t4]
    WHERE [t4].[SalesOrderID] = [t0].[SalesOrderID]
  ) AS [TotalSales]
FROM [Sales].[SalesOrderHeader] AS [t0]
LEFT OUTER JOIN [Sales].[SalesTerritory] AS [t1]
  ON [t1].[TerritoryID] = [t0].[TerritoryID]
INNER JOIN [Person].[Contact] AS [t2]
  ON [t2].[ContactID] = [t0].[ContactID]
WHERE [t1].[Name] = @p0

You can see the territory filter applied as a WHERE clause. Note that even when a string literal is used in the C# code, LINQ to SQL still passes filtering variables as parameters. In this case, the territory name 'Northeast' is passed as a variable named @p0. This is always a good practice because it helps to thwart the injection of potentially malicious T-SQL into your query. We can see another interesting feature of LINQ to SQL in the T-SQL that is created called projection. Because the C# code shown above shapes the output sequence to only a few required columns, the LINQ to SQL engine is smart enough to T-SQL shape the query to return only what's needed. Projection often improves query performance and always improves transportation speed on the wire.

Finally, notice that the third column projected into the output sequence, i.e. the sum of each order's value, is instatiated as a two-part, nested sub-SELECT operation in the T-SQL statement. The inner SELECT does the math on the order quantity and price. The containing SELECT aggregates the line item totals and them filters them to the rows selected by the outer query. Nicely done, LINQ! Now, we see that this is a long list, returning thousands of rows. If this query is meant for human consumption, we should break it into smaller chunks to make it easier to handle. How do we do that in LINQ? Add this to the C# code shown before.

var _pageNum = 3;
var _pageSize = 20;
query = query.Skip((_pageNum - 1) * _pageSize).Take(_pageSize);

This modification uses the Skip() and Take() extension methods to skip 40 rows and take the next 20 rows. In other words, at 20 results per page, this query now returns the 3rd page. Though the magic of deferred execution, we can add the Skip() and Take() extentions at any time before we begin iterating over the result set. This comes in handy when you want to enable paging for human consumption but to disable it for B2B or ETL scenarios. Is the paged T-SQL query shown here efficient though? You tell me. Here the T-SQL that is produced:

SELECT
  [t6].[FirstName],
  [t6].[LastName],
  [t6].[value] AS [TotalSales]
FROM
(
  SELECT
    ROW_NUMBER() OVER
    (
      ORDER BY
        [t5].[FirstName],
        [t5].[LastName],
        [t5].[value]
    ) AS [ROW_NUMBER],
    [t5].[FirstName],
    [t5].[LastName],
    [t5].[value]
  FROM
  (
    SELECT
      [t2].[FirstName],
      [t2].[LastName],
      (
        SELECT
          SUM([t4].[value])
        FROM
        (
          SELECT
            (CONVERT(Decimal(29,4),[t3].[OrderQty])) * [t3].[UnitPrice] AS [value],
            [t3].[SalesOrderID]
          FROM [Sales].[SalesOrderDetail] AS [t3]
        ) AS [t4]
        WHERE [t4].[SalesOrderID] = [t0].[SalesOrderID]
      ) AS [value], [t1].[Name]
      FROM [Sales].[SalesOrderHeader] AS [t0]
      LEFT OUTER JOIN [Sales].[SalesTerritory] AS [t1]
        ON [t1].[TerritoryID] = [t0].[TerritoryID]
      INNER JOIN [Person].[Contact] AS [t2]
        ON [t2].[ContactID] = [t0].[ContactID]
    ) AS [t5]
    WHERE [t5].[Name] = @p0
  ) AS [t6]
WHERE [t6].[ROW_NUMBER] BETWEEN @p1 + 1 AND @p1 + @p2
ORDER BY [t6].[ROW_NUMBER]

If you read the query from the inside out, you'll see that on the inside it's essentially the same query that we saw before we added the paging feature. It has all the original SELECTs named t1 through t4 and is wrapped as a new result called t5. The SQL Server ROW_NUMBER() function is used to inject a row number into t5 ordered by all 3 projected columns. That looks a lot like the query Justin showed us in his blog post. Very efficient! The new result containing the row numbers is named t6.

Finally, the t6 result is filtered by a starting row number and ending row number using two new variables @p1 and @p2. For page 3 paged in 20 row chunks as shown above, these variables would have the values 40 and 59, respectively. LINQ to SQL injects these starting and ending row number parameters whenever you use Skip() and Take() together. Well, it almost always does that. If you happen to specify Skip(0), it reverts to the behavior that Take() uses without Skip() which is to use SQL Server's TOP() function instead. LINQ to SQL sure knows how to sweet talk SQL Server, don't you think?

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , ,

C# | LInQ | ORM | Software Development | SQL Server | SQL Server 2008

An Unfortunate Consequence of History

by kevin 6/25/2008 6:45:00 PM

Databases are just awful. I don't mean the products themselves but the concept of databases. Stop and think about how absurdly we behave when we write modern software. We generate scads of information in the course of our daily lives. (A scad for me is about 2Gb the last time I checked. May be more or less for you.) Much of this information approximates the lives we lead and the obligations we must honor. But rather than putting that information into a system that has the tools necessary to model the real world from which the data originally emanated, we usually choose to keep it in a place that does an efficient job of storage. When we need to put it back into the real life approximation engine, we shuttle the information in and out of our application servers as necessary. It's been estimated that as much as 50% of the time we spend in development is in bridging the gap between data storage and the business logic of our applications. That number may be an extreme, low or high. But even if this kind of work accounts for only 25% of our time, why would we choose to spend our development budget this way? Data is so simple. It should just be there, fully accessible to me all the time.

Some operating systems do a better job of closing the gap between code and data than others do. For example, the Pick System, originally developed by Dick Pick in the late 1960s uses a hash-based file system to create associative arrays that are super-efficient for many query operations. The only data type in the Pick System is the string. And most importantly, the Pick database engine is not relational. It is a multi-valued instead, meaning that any attribute that needs to have multiple values can just declare them. In the Pick mind, there's never a need to create related tables and join them for query or reporting. A platform that implements this type of database also typically ships with a Pick BASIC compiler which allows for direct manipulation of the query engine and the associative arrays it produces. The BASIC code runs right there in the database, not on a foreign system. Embedded Pick BASIC is not like the SQL CLR. The SQL CLR, for lack of a better term, is bolted onto the side of SQL Server. You can't do any real data manipulation in the SQL CLR. However, in Pick BASIC, you can freely manipulate schema and data directly. Forget for a moment that it's BASIC and you've got something great there. Compiled code running in the database that can manipulate database objects natively. Way cool and circa 1965.

IBM and InterSystems, among other vendors, still sell these databases like hotcakes today because they solve very real business problems for which relational databases are not ideally suited. First of all, they're fast. And I mean smokin' fast for many types of operations, especially high-volume transaction processing applications. This is partially due to the fact that because there are no join operations (in the classical sense), there's usually less work to do to obtain the data you're seeking. But even when there is a sub-select operation that is required to get what you're looking for, the efficiency of the underlying hash-based file system pays off handily. In database terms everything in the database is indexed, always.

My students and colleagues often hear me say that, "Databases are an unfortunate consequence of history." I say this (and believe it) because if you could travel back in time to 1948 and give the ENIAC developers at the University of Pittsburgh a handful of 4Gb DIMM chips and the necessary plans to connect them to their invention, relational databases like Oracle and Microsoft SQL Server would simply never have evolved. I think that the development path would have been more like what Dick Pick envisioned and built instead. Given enough memory early in computing history, associative arrays, set operations and in-memory manipulation of large data sets would have been the norm. However, as we know, memory was severely constrained in the early days of computing. In fact, it's only been in the last few years as new technology has allowed for memory prices to drop dramatically that it has been feasible to conceive of a solid-state database at all. Oracle's TimeTen and Microsoft's Project Code Name Velocity are leading-edge concepts in a new market-segment that will, one day, fully realize Dick Pick's vision, in my opinion. I predict that accessing data from distributed, in-memory databases will become the norm within my lifetime.

Many of the current Object/Relational Mapping (O/RM) debates are centered around my database evolution postulate because O/RM tools attempt the inverse of what the Pick OS does to achieve the same effect. O/RM tools essentially pull as many database semantics (sans execution) into the application tier as possible where the logic of the program is codified. Whether we run Pick BASIC in the database or use an O/RM to marshal data close to our C# code, the desired outcome is the same. But pulling data into an external execution engine as O/RM tools do is pretty close to nightmarish, to be frank. In fact, Ted Neward, whom I greatly respect, calls O/RM the Vietnam of Computer Science today, meaning a quagmire from which one cannot possibly be extricated and for which there is no good outcome. Ouch! What a stinging rebuke from a guy who's singularly qualified to make an assessment in this space. Even Ayende Rahien's blog post from earlier today reveals a sense of desparation about the state of O/RM technology. What a mess we've gotten ourselves into! No O/RM suite that I know of addresses the real problem at hand, i.e. making data access so transparent that you don't even know you're doing it.

We use both NHibernate and Language Integrated Query (LINQ) to SQL at SnagAJob.com for O/RM. They make life easier in some ways but so much more difficult in others. I cannot begin to count up the hours we've spent tuning the session management code in NHibernate to deal with authentication and transaction management issues. And you don't burn up welterweight programmer resources on that kind of work. Your heavy hitters need to be deeply involved because there are architectural design issues at every turn. Every minute that your senior developers and architects are distracted with this kind of stuff, they aren't focusing on what you thought you hired them for. LINQ is better than HNibernate in a couple of ways, chiefly because of the expressiveness afforeded by the IEnumerable<T> extension methods and the query comprehension syntax. But deploying LINQ to SQL or LINQ to Entities in a real-world environment is still not as simple as it should be. And the real goal of transparent data access is still far, far way using NHibernate or LINQ.

If you know of an O/RM suite that makes accessing SQL data more Pick-like as I've described, i.e. more transparent, I'd like to hear about it.

<Interesting Related Story> In 1993 while working for Datastorm Technologies, Inc., I attended Comdex in Las Vegas. At lunch one day, two fellows joined me at the table. The older fellow to my right introduced himself as Dick Pick. I asked him what he did for a living and he graciously and eagerly explained the Pick OS, it's simple power and beauty and a smallish version of his life story. I was impressed but didn't really get it at the time partly because the fellow seated across the table introduced himself as Phil Katz, the inventor of the PKZip file compression utility. For me, Phil Katz's fame overshadowed Dick Pick's because I didn't know any better. So, I didn't engage with Dick in conversation to the degree that I really should have. History, it seems, hasn't been all that respectful to Dick Pick either. Phil Katz has a detailed Wikipedia article about him yet Dick Pick doesn't, for example. Googling for Dick Pick yields scads (there's that word again) of Dick's Pick's Grateful Dead references and nearly nil related to the computer science genius of our time. In retrospect, even being seated with a legend like Dick Pick was a real honor. I wish I had known to take advantage of the opportunity that was given to me. Live and learn. </Interesting Related Story>

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Architecture | LInQ | Software Development | SQL Server | SQL Server 2008

SQL Roundtable Discussion with Andy, Ron and Mark

by kevin 6/12/2008 8:10:00 AM

If you are in central Virginia today (12 June 2008), be sure to join Andy Leonard, Ron Deskins, Mark Hudson and me for a roundtable discussion about SQL server. Andy, Ron and Mark are all great community leaders and they happen to be excellent SQL database architects. This will be a fun and informative meeting. Be sure to register using the link below. Here are the details:

When: Thursday, June 12th, 2008, 6:30 PM - 8:00 PM
Social Time starts at 6:00 PM!

Who: Everyone! In particular Andy, Ron, and Mark.

What: This is going to be round table discussion facilitated by Andy Leonard, Ron Deskins, and Mark Hudson.  The topic is SQL Server, but other than that it is wide open.  Please post your comments to the board so we can get a lively discussion going at the meeting!  

Where:
Location: Markel Plaza
4600 Cox Road
Glen Allen, VA 23060 [map]

Register: Click here to register!

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , ,

Richmond | SQL Server | User Group

Profiling LINQ Queries Presentation

by kevin 5/19/2008 10:45:00 PM

I did a presentation called "Profiling and Tuning LINQ Queries" for the Richmond Sharepoint User Group on May 8th. I had gotten so busy, I forgot to post the slides and sample code. Here it is.

Profiling and Tuning LINQ Queries.pptx (75.58 kb) - these are the PowerPoint 2007 slides

ProfileLINQ20080508.zip (27.96 kb) - this C# project builds a dynamic query against the Northwind database using LINQ to SQL, allowing the use of the SQL Profiler to be used to see the increasing complexity of the queries as criteria is added and data is shaped using increasingly complex anonymous types.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , ,

Debugging | Richmond | SQL Server | User Group

My Spring 2008 Presentations

by kevin 3/24/2008 4:44:00 PM

Here's my speaking schedule for the next few weeks in case anyone's interested in hearing me speak on these topics:

  • March 26, 2008 - Silverlight 101 - Innsbrook .NET User Group
    6:00 to 8:30 p.m. on the ECPI Campus at 4305 Cox Road, Glen Allen, Virginia
  • April 26, 2008 - Richmond Code Camp
    8 a.m. to 5 p.m. on the ECPI Campus at 4305 Cox Road, Glen Allen, Virginia
  • April 26, 2008 - Richmond "Meet and Code" Dinner
    6 p.m. at the Markel Plaza Building, 4600 Cox Road, Glen Allen, Virginia
  • May 1, 2008 - Silverlight and WCF - Roanoke Valley .NET User Group
    6 to 8 p.m. at the NEW Roanoke County Public Safety Building
  • May 5, 2008 - Information Technology Career Symposium - Randolph-Macon College
    6 to 8:30 p.m. in Washington-Franklin Hall

In addition, I'll be attending these upcoming meetings but not presenting:

  • March 27, 2008 - Richmond Sharepoint User Group
  • April 3, 2008 - Richmond .NET User Group
  • April 7, 2008 - Delivering Effective Presentations (Richmond Code Camp Preparation)
  • April 10, 2008 - Richmond SQL Server User Group

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Fun | Richmond | Software Development | SQL Server | User Group | WCF

SQL Server Roadmap Discussion

by kevin 3/19/2008 12:18:00 AM

If you're going to be in Richmond, Virginia on April 10, 2008, consider attending the SQL Server Product Roadmap Discussion at the local Microsoft office. Sign up here. I'll be there and I'll write about what we learn there after it happens. Some details:

Event Overview

Are you planning new database projects in the next 12 months? Do you need to increase the scale of your databases, improve your developers’ productivity, or provide your users with new reporting and analytical capabilities?

Microsoft SQL Server 2005 has brought significant enhancements in performance, availability and security to enterprise data centers and along with its powerful set of DBA, developer, and BI capabilities has become the platform of choice for Line of Business, Enterprise, and Data Warehouse applications.

Join us for an interactive presentation and live product demonstrations as we:

·         Review the SQL Server 2000 support roadmap

·         Walk through the major improvements and enhancements now available in SQL Server 2005 SP2

·         Preview the exciting new features of SQL Server 2008 and discuss how you can get a head start with SQL Server 2008

Who Should Attend:

This discussion is for Database Administrators and Developers who are familiar with MS SQL Server 2000 and 2005 and who are interested in learning more about the features available in SQL Server 2005 SP2 and what’s around the corner with SQL Server 2008.

Prerequisites:

General knowledge of SQL Server 2000 and 2005.

RSVP: This special, no charge event will fill quickly and space is limited, so be sure to register as soon as possible to reserve your seat.

Technorati Profile

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: ,

Richmond | SQL Server | SQL Server 2008

Powered by BlogEngine.NET 1.3.1.0
Theme by Mads Kristensen


Kevin's on Twitter / FriendFeed

W. Kevin Hazzard Welcome to Kevin Hazzard's Blog. Kevin is a Software Architect, Professor and Microsoft MVP specializing in C#, WCF, Silverlight and IronPython.

View Kevin Hazzard's profile on LinkedIn
Microsoft MVP Award Foolish robot!

Calendar

<<  October 2008  >>
MoTuWeThFrSaSu
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789

View posts in large calendar

Recent comments

Authors

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Sign in