I am working on a project where I needed a list of English words in a Microsoft SQL Server database. I found some public domain lists of English words at:
ftp://ftp.ox.ac.uk/pub/wordlists/dictionaries
There are 11 interesting word lists here including:
-
Unabridged
-
CRL
-
Roget
-
Unix
-
Antworth
-
Knuth
-
KnuthBritish
-
Englex
-
Shakespeare
-
Pocket
-
UU.net
Most of these lists haven't been updated since the mid-1990s so if you find a more updated (free) source of English words, please let me know. I loaded all the data into a table that has these attributes:
-
[WordGuid] [uniqueidentifier] NOT NULL
-
[WordText] [nvarchar](30) NOT NULL
-
[WordLength] [tinyint] NOT NULL
-
[SoundexGroup] [nchar](1) NOT NULL
-
[SoundexValue] [smallint] NOT NULL
-
[GroupId] [smallint] NULL
-
[IsPalindrome] [bit] NOT NULL
-
[InUnabr] [bit] NOT NULL
-
[InAntworth] [bit] NOT NULL
-
[InCRL] [bit] NOT NULL
-
[InRoget] [bit] NOT NULL
-
[InUnix] [bit] NOT NULL
-
[InKnuthBritish] [bit] NOT NULL
-
[InKnuth] [bit] NOT NULL
-
[InEnglex] [bit] NOT NULL
-
[InShakespeare] [bit] NOT
-
[InPocket] [bit] NOT NULL
-
[InUUNet] [bit] NOT NULL
The [WordGuid] is actually the MD5 hash of the [WordText] expressed as a UNIQUEIDENTIFIER so it makes a nice universal primary key. I've precomputed the [WordLength], [IsPalidrome] and a couple of Soundex values to make querying the table a bit more efficient. I've also computed a [GroupId] for each word. Every word that shares a [GroupId] is composed of exactly the same letters in various orders. You could find all the whole word anagrams for a given word using the [GroupId] for example. Finally, I've created a handful of [In*] flags to tell me which word file(s) each word was sourced from. I've made the database available in two forms below:
Attachable (as MDF/LDF) Microsoft SQL Server 2005 Database (21.20 MB)
Tab-delimited CSV File with Table Creation Script (11.10 MB)
Please see the licenses in the files at the source web site listed at the top of this post. All of the licenses are academic and free for use but your company may want to read and catalog them for full compliance.
Enjoy!