An Unfortunate Consequence of History

by kevin 6/25/2008 6:45:00 PM

Databases are just awful. I don't mean the products themselves but the concept of databases. Stop and think about how absurdly we behave when we write modern software. We generate scads of information in the course of our daily lives. (A scad for me is about 2Gb the last time I checked. May be more or less for you.) Much of this information approximates the lives we lead and the obligations we must honor. But rather than putting that information into a system that has the tools necessary to model the real world from which the data originally emanated, we usually choose to keep it in a place that does an efficient job of storage. When we need to put it back into the real life approximation engine, we shuttle the information in and out of our application servers as necessary. It's been estimated that as much as 50% of the time we spend in development is in bridging the gap between data storage and the business logic of our applications. That number may be an extreme, low or high. But even if this kind of work accounts for only 25% of our time, why would we choose to spend our development budget this way? Data is so simple. It should just be there, fully accessible to me all the time.

Some operating systems do a better job of closing the gap between code and data than others do. For example, the Pick System, originally developed by Dick Pick in the late 1960s uses a hash-based file system to create associative arrays that are super-efficient for many query operations. The only data type in the Pick System is the string. And most importantly, the Pick database engine is not relational. It is a multi-valued instead, meaning that any attribute that needs to have multiple values can just declare them. In the Pick mind, there's never a need to create related tables and join them for query or reporting. A platform that implements this type of database also typically ships with a Pick BASIC compiler which allows for direct manipulation of the query engine and the associative arrays it produces. The BASIC code runs right there in the database, not on a foreign system. Embedded Pick BASIC is not like the SQL CLR. The SQL CLR, for lack of a better term, is bolted onto the side of SQL Server. You can't do any real data manipulation in the SQL CLR. However, in Pick BASIC, you can freely manipulate schema and data directly. Forget for a moment that it's BASIC and you've got something great there. Compiled code running in the database that can manipulate database objects natively. Way cool and circa 1965.

IBM and InterSystems, among other vendors, still sell these databases like hotcakes today because they solve very real business problems for which relational databases are not ideally suited. First of all, they're fast. And I mean smokin' fast for many types of operations, especially high-volume transaction processing applications. This is partially due to the fact that because there are no join operations (in the classical sense), there's usually less work to do to obtain the data you're seeking. But even when there is a sub-select operation that is required to get what you're looking for, the efficiency of the underlying hash-based file system pays off handily. In database terms everything in the database is indexed, always.

My students and colleagues often hear me say that, "Databases are an unfortunate consequence of history." I say this (and believe it) because if you could travel back in time to 1948 and give the ENIAC developers at the University of Pittsburgh a handful of 4Gb DIMM chips and the necessary plans to connect them to their invention, relational databases like Oracle and Microsoft SQL Server would simply never have evolved. I think that the development path would have been more like what Dick Pick envisioned and built instead. Given enough memory early in computing history, associative arrays, set operations and in-memory manipulation of large data sets would have been the norm. However, as we know, memory was severely constrained in the early days of computing. In fact, it's only been in the last few years as new technology has allowed for memory prices to drop dramatically that it has been feasible to conceive of a solid-state database at all. Oracle's TimeTen and Microsoft's Project Code Name Velocity are leading-edge concepts in a new market-segment that will, one day, fully realize Dick Pick's vision, in my opinion. I predict that accessing data from distributed, in-memory databases will become the norm within my lifetime.

Many of the current Object/Relational Mapping (O/RM) debates are centered around my database evolution postulate because O/RM tools attempt the inverse of what the Pick OS does to achieve the same effect. O/RM tools essentially pull as many database semantics (sans execution) into the application tier as possible where the logic of the program is codified. Whether we run Pick BASIC in the database or use an O/RM to marshal data close to our C# code, the desired outcome is the same. But pulling data into an external execution engine as O/RM tools do is pretty close to nightmarish, to be frank. In fact, Ted Neward, whom I greatly respect, calls O/RM the Vietnam of Computer Science today, meaning a quagmire from which one cannot possibly be extricated and for which there is no good outcome. Ouch! What a stinging rebuke from a guy who's singularly qualified to make an assessment in this space. Even Ayende Rahien's blog post from earlier today reveals a sense of desparation about the state of O/RM technology. What a mess we've gotten ourselves into! No O/RM suite that I know of addresses the real problem at hand, i.e. making data access so transparent that you don't even know you're doing it.

We use both NHibernate and Language Integrated Query (LINQ) to SQL at SnagAJob.com for O/RM. They make life easier in some ways but so much more difficult in others. I cannot begin to count up the hours we've spent tuning the session management code in NHibernate to deal with authentication and transaction management issues. And you don't burn up welterweight programmer resources on that kind of work. Your heavy hitters need to be deeply involved because there are architectural design issues at every turn. Every minute that your senior developers and architects are distracted with this kind of stuff, they aren't focusing on what you thought you hired them for. LINQ is better than HNibernate in a couple of ways, chiefly because of the expressiveness afforeded by the IEnumerable<T> extension methods and the query comprehension syntax. But deploying LINQ to SQL or LINQ to Entities in a real-world environment is still not as simple as it should be. And the real goal of transparent data access is still far, far way using NHibernate or LINQ.

If you know of an O/RM suite that makes accessing SQL data more Pick-like as I've described, i.e. more transparent, I'd like to hear about it.

<Interesting Related Story> In 1993 while working for Datastorm Technologies, Inc., I attended Comdex in Las Vegas. At lunch one day, two fellows joined me at the table. The older fellow to my right introduced himself as Dick Pick. I asked him what he did for a living and he graciously and eagerly explained the Pick OS, it's simple power and beauty and a smallish version of his life story. I was impressed but didn't really get it at the time partly because the fellow seated across the table introduced himself as Phil Katz, the inventor of the PKZip file compression utility. For me, Phil Katz's fame overshadowed Dick Pick's because I didn't know any better. So, I didn't engage with Dick in conversation to the degree that I really should have. History, it seems, hasn't been all that respectful to Dick Pick either. Phil Katz has a detailed Wikipedia article about him yet Dick Pick doesn't, for example. Googling for Dick Pick yields scads (there's that word again) of Dick's Pick's Grateful Dead references and nearly nil related to the computer science genius of our time. In retrospect, even being seated with a legend like Dick Pick was a real honor. I wish I had known to take advantage of the opportunity that was given to me. Live and learn. </Interesting Related Story>

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Architecture | LInQ | Software Development | SQL Server | SQL Server 2008

Related posts

Comments

6/25/2008 7:50:35 PM

pingback

Pingback from databasemanagementconsulting.com

Database Management » Blog Archive » An Unfortunate Consequence of History

databasemanagementconsulting.com

6/25/2008 10:02:43 PM

Justin Etheredge

The whole "impedance mismatch", as people are apt to call it these days, between data and logic is certainly one of the biggest problems of our day. It is an interesting idea to think that the problem itself may one day be solved by simply getting rid of the mismatch altogether. The one thing that I would clear up is that even though data itself may be simple, the relationships are complex, and that is where most of the issues with ORM tools come in. How do you map the relationships between objects to relationships between data in a db?

When I consider that though, I begin to think about object databases, and the fact that they never really caught on even though they did an excellent job getting rid of the impedance mismatch. They had other problems, but the biggest fact was that they threatened the status quo and they didn't have a standard query language like SQL databases did. I wonder if we will see any resurgence in that space.

One other thing of note is that Amazon's SimpleDB behaves exactly as you describe Dick Pick's system. It is a non-relational database where each column can hold multiple values. And just as Dick Pick's system worked, it too only holds strings and indexes everything. It is as if Amazon went back in time and just reinvented the Pick system. Oh how interesting history can be!

Justin Etheredge us

6/27/2008 10:02:00 AM

Martin Laufer

I'm very impressed by the 2nd paragraph. Is there any information out there about the PickOS explaining the concepts? I do prefer real books... (having something to show on the bookshelf) Can You please provide some useful hints (if there are no links) or are You able to explain the concept in more detail please?

Thanks in advance

Martin Laufer

Martin Laufer de

6/27/2008 12:30:08 PM

W. Kevin Hazzard

Hello Martin,

I would start by downloading the free version of InterSystems Caché database. Make sure you get the multi-valued edition. It has a lot of interesting capabilities like the ones I mentioned in the blog post. You should also check out jBase's mv.NET product.

http://www.intersystems.com/cache/index.html
http://www.jbase.com/products/mvnet.html

As for books on MV databases, I don't know of any great ones I could recommend. There's a lot of material around Caché and jBase to get you started though.

Kevin

W. Kevin Hazzard us

7/16/2008 10:24:20 AM

SDC

Of course, the reality of having everything be of type string leads to the potential for abysmal data quality. SQL Server's date functions are fun and handy, they depend on a column being in date format. It's kind of like the 'help me help you' thing SQL Server requires. The potential for monumental numerical mischief is also incredible. You convert the string to a number, but what representation do you use? Is it an integer, an IEEE-754 float, what the heck is it? Is it really a good idea to slop all the relationships in one file?

Does the speed of a compiled application necessarily help in the case of very large databases, in which case disk I/O is your limiting factor? Google's Sawzall is interpreted, because it just doesn't matter for the problem they deal with.

Possibly things work for small data sets and small projects, but even then, give me Python over PickBASIC any day. Python definitely has a higher mind share, so there's hope other people will be able to pick (sorry) up what I did if necessary.

'Put it all in memory' is great but it ignores the fact that our ability to record and the cheapness of storing vast ridiculously huge sets of data seems to be growing by leaps and bounds.

I don't delude myself that RDBMS is the one true way, but having had a glimpse into PickBASIC related reality, believe me, things could be a lot worse than having to have the smart kids fiddle with NHibernate.

SDC us

7/16/2008 10:57:52 AM

W. Kevin Hazzard

@SDC Well said. My rant is hypothetical and somewhat theoretical, to be sure. I just wish that databases like jBase, UniVerse and Caché had more air time in the world I live in. It seems that Oracle and Mirosoft RDBMSs get all the attention and they aren't ideally suited for some of the work I need to do. But the sunk cost in people, software and hardware for the RDBMS makes it difficult to justify buying anything else. DB2 has some nice features but the BBBoW (Big Blue Ball of Wax) has more even more inertia associated with it than moving to a smaller, point type solution when I need to.

Having strong data types isn't a bad approach for handling some constraints but, as Bruce Eckel says, strong testing can be much more valuable than strong typing. See:

http://mindview.net/WebLog/log-0025

I agree with Bruce completely. In a database that types data as loosely as possible for storage, there's no reason we can't build constraints and other tests to the correct degree based on the specific needs of the application. In my view of what a database really is, I see Bruce's argument fitting in perfectly and becoming perfectly apropos for that space.

W. Kevin Hazzard us

8/6/2008 6:36:07 PM

Sameer Alibhai

Did you try googling for "Richard Pick" ? That has a few more results.

Sameer Alibhai ca

8/6/2008 10:45:20 PM

Kevin Hazzard, MVP

@sameer I don't get good results from Google with that search term. The whole first page of search results is classical guitar tutorials and it doesn't get better on page 2. Which search engine are you using?

Kevin Hazzard, MVP us

Add comment


(Will show your Gravatar icon)  

  Country flag

[b][/b] - [i][/i] - [u][/u]- [quote][/quote]



Live preview

10/7/2008 3:14:13 PM

Powered by BlogEngine.NET 1.3.1.0
Theme by Mads Kristensen


Kevin's on Twitter / FriendFeed

W. Kevin Hazzard Welcome to Kevin Hazzard's Blog. Kevin is a Software Architect, Professor and Microsoft MVP specializing in C#, WCF, Silverlight and IronPython.

View Kevin Hazzard's profile on LinkedIn
Microsoft MVP Award Foolish robot!

Calendar

<<  October 2008  >>
MoTuWeThFrSaSu
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789

View posts in large calendar

Recent comments

Authors

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Sign in