got net?

Kevin Hazzard's Brain Spigot

About the author

Welcome to Kevin Hazzard's blog.
E-mail me Send mail

Recent posts

Recent comments

Authors

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2010

Exploring SQL Azure

I've been working with SQL Azure for some time now and I really like it from a technology perspective. For the uninitiated, SQL Azure, formerly SQL Data Services (SDS), is Microsoft's foray into relational databases in "the cloud". Microsoft runs a special build of SQL Server that operates in a highly-available configuration on the servers in its Azure data centers. From a system administrator's standpoint, there are some radical differences between SQL Server and SQL Azure. How in the world are we going to live without the BACKUP command or the KILL STATS JOB command, after all? When designing SQL Azure, Microsoft took a long look at the list of features that had grown into SQL Server over time and realized that there were a lot of physicality features that had become baked into the T-SQL language that make no sense whatsoever running in a grid type configuration with thousands of other active databases. In my mind, this is a good thing because it forces Microsoft to think critically about what pyhsical and logical assets really make up the database from an administrator's perspective and from a developer's perspective. Companies like Teradata and Netezza have been thinking this way for some time now in an effort to make their database appliances much simpler to manage. I sincerely hope that some of what Microsoft is learning with SQL Azure creeps back into SQL Server 2011 (or whatever it will be called). If so, it will be a great thing for Microsoft and its customers. In the short term, these tradeoffs will make traditional database and network administrators feel off balance, though.

Developers, on the other hand, will find SQL Azure quite comfortable. Much to my surprise, even my NHibernate-based applications work in SQL Azure without modification. One caveat there: If you use the NHibernate Hbm2Ddl utility as part of a Domain-Driven Design (DDD) process, just watch out for the fact that SQL Azure does not support heap tables. Because of this, every SQL Azure table must have a clustered index so make sure that all the tables in your model have a primary key or at least have a clustered index. (I know, I know. Primary keys don't have to be implemented as clustered indexes in SQL Server but you get the point.) A full list of SQL Server unsupported T-SQL can be found on MSDN. And while the list might look really long, the average application is not likely to encounter a lot of problems running against a SQL Azure database. The T-SQL statements that I'll personally miss the most in SQL Azure? Those would be OPENQUERY, OPENXML, SELECT INTO and NEWSEQUENTIALID.

Database Administration

Let's begin by looking at a bit of administration. The SQL Azure portal is the best place to start. You must log in with your Windows Live ID and click on the link for your project page. The screen for managing your SQL Azure databases is rather sparse. I was expecting to see something like SQL Server Management Studio on the web. Perhaps the web interface will evolve to support more features over time. For now, it presents something like this:

I've masked parts of the screen shot to protect my privacy but you get the idea. The section at the top shows basic information about you SQL Azure "server" with an option to reset the administrator password. The section below shows two tabs labeled "Databases" and "Firewall Settings". The databases tab shown above allows you to create databases, see the connection strings required to connect to them or to drop them. The list shows the current size and size limit for each database. As of this writing, SQL Azure limits databases to 1GB or 10 GB maximum sizes. Hopefully, that will change in the future to allow much larger databases to exist in the cloud. I mean, why build a mesh or grid infrastructure for massive database scaling and limit it to 10 gigabytes? The size of the 3 databases shows them to be zero bytes in size because I truncated them before writing this article. After clicking on one of the radio buttons beside a database name and clicking the "Connection Strings" button, you'll see an AJAX popup that looks something like this:

The popup shows what the ADO.NET and ODBC connections strings would look like in an application configuration file. Notice in both connection strings that there's nothing special about SQL Azure. We can use the plain, old SQL Server Native Client 10.0 over TCP/IP to connect to the Azure database. But can just anyone across the Internet connect to your data? Of course not. Microsoft allows you to restrict connections by IP address ranges or from Windows Azure tasks that you may be running in the MicrosoftServices cloud. The Firewall Settings tab on the main screen is where we can do that. The Firewall Settings screen looks something like this:

Again, I've masked out my IP address in the Record Name that I created called VZW shown here. Since I work on the go using my EVDO card a lot, I need to change the IP Address Range every day, sometimes several times a day. There's a way to change the firewall rules through DML but I've yet to try that. I was thinking of writing a Windows Azure or .NET Services service that would have access via the MicrosoftServices checkbox shown above. I could call the service with another form of authentication to have it update my SQL Azure firewall rules automatically. Until I write that, I'll have to use this web-based console interface to set up the firewall rules. One thing I've noticed during the CTP is that the firewall rules don't take effect immediately. When my IP address changes and I make a rule change in the SQL Azure Firewall Settings, it may take up to 10 minutes to push that change to the firewall and execute it. When I'm ready to make a change, I simply press the "Edit Record" button and an AJAX popup that looks like this is rendered:

Notice that the firewall rule for SQL Azure allows you to specify a range of IP addresses, not just one. That would be handy to use if, for example, all of the addresses within a Class C IPv4 Address block were allowed to connect to the SQL Azure databases you manage. I didn't see any support for IPv6 addresses in the Firewall Settings but I'm supposing that Microsoft will have to support IPv6 in the rules in the future. Additionally, I'd expect to see some richer firewall rules type, e.g. the use of subnet masks to further refine the grant or denial rules and rules based on IPSEC/VPN configuration. For now, IP ranges are enough to get started. I could show you the screens for creating and dropping databases in the web-based console but it's really not all that interesting. Besides, we can do that using the SQLCMD command-line tool as shown in the Database Access section below.

Database Access

I admit it. I'm a UNIX hacker from pre-history so whenever I have the chance to master something at the command line, I jump at it. So when I heard that SQL Azure worked well with the SQLCMD line tool, it brought a grin to my face. Here's a screen shot of a Windows Powershell-based exchange between me and my SQL Azure server. Once again, I've blanked out some of my personal information but this time, it's color coded to help you understand what's important.

The first thing to take note of is that I've issued 4 commands here, number 2# through 5#. Command 2# connects to the master database: notice the -d master parameter at the end of the 2# command line? The 1> prompt means SQLCMD is waiting for input from me. I typed "CREATE DATABASE Blog" followed by Return then "GO" followed by return. Until I provide the "GO" statement, all of my commands would be batched on the client side. When the "GO" command is received, it's not sent to SQL Azure. Instead, it's a signal to the SQLCMD client to send the current batch to the remote server. When the results returned from command 2# show no error, the 1> prompt shows to indicate the start of a new batch. The exit command takes me back to PowerShell.

At this point, our new Blog database has been created in the cloud and we're ready to use it. So in command 3#, I typed in a somewhat lengthy table definition. When I used the "GO" command to execute the batch, however, I got an error saying that the remote host (SQL Azure) has closed the connection. I took too long to type the command so SQL Azure, being a good steward of resources like IP connections, dropped the connection. The fact is that the connection was probably already dropped after I entered the "CREATE TABLE" command at prompt 1> and before I entered the "GO" directive at prompt 2>. But because SQLCMD was batching my commands, it didn't sense that the connection had been dropped until it tried to send the batch to SQL Azure. I haven't found a way to make SQL Azure keep the connection open longer and I probably would advise against using that feature if it exists. IP connections are precious to any kind of server that needs to scale to large number of users. Forcing SQL Azure to keep the connection open longer so you can type is just a bad idea because it would severely impede scalability.

Fortunately, I don't have to type that long "CREATE TABLE" command in again. When I ran SQLCMD again in step 4#, I simply pressed the up arrow on my keyboard and it "remembered" the command. Tapping return and issuing the "GO" command, the table is successfully created in the new Blog database. In the same session, I then started to type an "INSERT" command to put some data into the new Article table. Again, I took too long to do it so you can see that the connection was again closed by SQL Azure before committing the batch. No worries, though. Rerunning SQLCMD and using the up arrow trick saved the day again. The values were inserted successfully on the second attempt.

Now let's talk a bit about those masked out values that I color-coded in the screen shot. The yellow masks are where my user ID is inserted. You can create users and logins using the "CREATE USER" and "CREATE LOGIN" commands in SQL Azure just as you can in SQL Server. Once you've done that, you can use the user IDs with access to a given database to do your work. The @ sign trailing the user ID in each SQLCMD is significant. For whatever reason, you must connect to SQL Azure with the user ID (-U) qualified at (@) your server name. The server name shows in the pink or salmon masks. Also notice that the pink/salmon masks show up again in each command in the server name (-S) section. Just remember to specify your SQL Azure server name after the @ in the -U parameter and again in the -S parameter. Also, the -S parameter must contain the Fully Qualified Domain Name (FQDN) of the server so that the SQLCMD tool can resolve to your SQL Azure server's IP address. Now, let's query the data.

Notice that in command 10# that I used SQLCMD's -q parameter to pass a query string to my SQL Azure Blog database. In this case, it's a SELECT statement that dumps the data that I inserted with command 4# earlier. The output format isn't so pretty but you can tell that the data matches what I inserted before. Since I used the -q (lowercase) parameter, I still have to use the "exit" command to leave the SQLCMD interpreter. In command 11#, I used the -q parameter again to DROP the Article table from the Blog database. And in command 12#, I used the DROP DATABASE statement to drop the Blog database from SQL Azure altogether. In this case, I had to specify the master database using SQLCMD's -d parameter. Also note that by using the -Q (uppercase) parameter in command 12#, the exit statement is implied so I didn't have to exit manually as I did in commands 10# and 11#. That's handy.

Coding to SQL Azure

If you are using ADO.NET or ODBC, the connection strings to your SQL Azure database can be obtained from the SQL Azure web console as shown earlier. Let's take a moment to dissect the ADO.NET connection string while we're on the subject. I'll only address the parts that need some special attention below:

Server=tcp:<server name>.database.windows.net; Database=<database name>; User ID=<user name>; Password=<password>; Trusted_Connection=false;

  • Server=tcp:<server name>.database.windows.net - this is the fully qualified domain name of your server, prefixed with the tcp: directive. This tells the SQL Server Native Client to use the TCP/IP protocol to connect to the FQDN that you specify. If your client is configured to prefer named pipes or some protocol over TCP/IP, the tcp: directive in the connection string tells it to skip directly to TCP/IP instead.
  • User ID=<user name> - unlike SQLCMD, when using ADO.NET, the user name does not have to include @<server name> as the suffix. Just the user name part will do.
  • Trusted_Connection=false - this may not be what you think. This directive doesn't mean that the connection won't be secure. Every SQL Azure Tabular Data Stream (TDS) connection is tunnelled through the Secure Sockets Layer (SSL). Instead, this directive means that we won't be using OS-driven authentication like NTLM or Kerberos.

You can construct a connection string in C# quite simply by using the SqlConnectionStringBuilder class as follows:

private const string UserName = "jrsamples";
private const string Password = "m1Nn1ep3@rL";
private const string ServerName = "br549.database.windows.net";
private const string DatabaseName = "Blog";

...

var connBuilder = new SqlConnectionStringBuilder
                   {
                       DataSource = ServerName,
                       InitialCatalog = DatabaseName,
                       Encrypt = true,
                       TrustServerCertificate = false,
                       UserID = UserName,
                       Password = Password
                   };

Notice that the Encrypt property is set to true in the connection string builder. This isn't strictly required because SQL Azure will force this value to true even if the client does not specify it. You should also note that SQL Azure does not accept connections on any TCP port other than 1433 at this time. So don't try to use a different port in the connection string builder or the connections using it will fail. When you're ready to use the SqlConnectionStringBuilder, invoke the ToString() method to get the full connection string back for use in your code. I'd show you some ADO.NET code here to do INSERT, UPDATE and DELETE but, to be honest, it would be pretty boring. Your ADO.NET code most likely won't have to be changed when moving from SQL Server to SQL Azure.

Product Availability

As of this writing, SQL Azure is still in CTP (Community Technology Preview) and not available for commercial use. The Microsoft Professional Developer Conference (PDC) coming up in mid-November 2009 is the time that's expected for commercial launch of the product. Right now, it appears as though Microsoft is going to limit databases in the SQL Azure cloud to 1GB or 10GB, so many larger-scale commercial applications may have to wait for a time when 100GB or larger databases may be ported. There's no guarantee that will ever happen but one has to assume that Microsoft, once it has gotten some commercial experience serving real customers, will open SQL Azure up to databases that can really show its capabilities.

Pricing

Check Microsoft's SQL Azure Pricing information page for details about cost and measurement.

Closing Thoughts

Microsoft's first attempt at putting SQL Server into the cloud is fairly impressive. And although SQL Server Management Studio can be used with SQL Azure, there are some known compatibility issues that make using command line tools safer for the time being. I'm betting on the fact that Microsoft will make some rich, GUI-based management tools available in due time. After all, Microsoft differentiated itself in the database space years ago by making network and database administrator jobs much easier through the use of great tools. Why wouldn't they continue that trend with SQL Azure? With respect to the query engine and the storage engine in SQL Azure, this first release is fairly strong. The fact that my NHibernate-based applications run without modification is impressive to say the least. If you're accustomed to writing lots of rich stored procedures that use every trick in SQL Server 2008's book, you may encounter some problems when using SQL Azure, though. There are many subtle changes and omissions in the implementation concerning those features that many of us consider to be on the periphery.

Will SQL Azure be a hit in the marketplace? Who knows? That's the big question now. There's little uptake on relational cloud databases in general so it remains to be seen if the popularity of SQL Server will translate into the cloud well. That will have a lot to do with pricing and Microsoft's target market which isn't fully understood just yet. Imagine a medium-sized company that would have to pay for server hardware and SQL Server 2008 Standard Edition plus the Client Access Licenses to make the system available. Then there are the environmental factors like power and cooling to consider. There's also hardware and software maintenance to add in and the people to manage it all at three nines of uptime per month. What's that worth per year? If Microsoft can convince business managers that it's a safe thing to do, that the development experience is great and that the pricing's right, SQL Azure could be quite popular in the marketplace. Only time will tell. Personally, I'm already thinking of clients who could benefit by shedding their servers in favor of cloud databases. I'm definitely going to start small, though, and work my way up. My clients who are spending between $2,000 and $3,500 US per server (Total Cost of Ownership) with less than 10GB of storage are the ones who could benefit the most by considering the move to SQL Azure.


Tags: , ,
Posted by kevin on Sunday, November 01, 2009 1:17 PM
Permalink | Comments (2) | Post RSSRSS comment feed

SQL UNIQUEIDENTIFIERs are Really Big Integers

I wrote a blog post called How SQL Server Sorts the UNIQUEIDENTIFIER Type and another one called Ordering the SQL UNIQUEIDENTIFIER Type Numerically Correct for Reporting a while back. As a result, I get a lot of e-mails from people struggling with UNIQUEIDENTIFIER values in Microsoft SQL Server. That's cool because I like helping other developers. The mistake that most people make when working with this data type is treating them like strings. However, UNIQUEIDENTIFIERS are absurd looking integers, really big ones. We show them in hexadecimal format to make them more compact which adds to their absurdness, I suppose.

As I demonstrated in my previous blog posts, SQL Server adds to the absurdity by making the readable version of UNIQUEIDENTIFIER values fundamentally different from their numerical handling. For example, a UNIQUEIDENTIFIER that reads as FFEEDDCC-BBAA-9988-6677-001122334455 in a SQL script will be treated as an integer that we humans would read from left to right as 00112233-4455-6677-8899-AABBCCDDEEFF. We expect the most significant digits of a number to appear on the left and the least significant digits to appear on the right. But SQL Server doesn't work that way. Here's some T-SQL code that will create a table called [TestValue] and populate it with some UNIQUEIDENTIFIER values.

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[TestValue]
(
    [RowId] [INT] IDENTITY(1,1) PRIMARY KEY NOT NULL,
    [UID] [uniqueidentifier] NOT NULL,
    [ReadableUID] [nchar](36) NULL
)
GO

CREATE FUNCTION [dbo].[NumericallyCorrectUid]
(
    @uid UNIQUEIDENTIFIER
)
RETURNS NCHAR(36)
AS
BEGIN
    DECLARE @result NCHAR(36)
    SET @result = CONVERT(NCHAR(36), @uid)
    SET @result =
        SUBSTRING(@result, 25, 8)
        + N'-'
        + RIGHT(@result, 4)
        + SUBSTRING(@result, 19, 6)
        + SUBSTRING(@result, 17, 2)
        + SUBSTRING(@result, 15, 2)
        + N'-'
        + SUBSTRING(@result, 12, 2)
        + SUBSTRING(@result, 10, 2)
        + SUBSTRING(@result, 7, 2)
        + SUBSTRING(@result, 5, 2)
        + SUBSTRING(@result, 3, 2)
        + LEFT(@result, 2)
    RETURN @result
END
GO

CREATE TRIGGER [dbo].[trg_UpdateReadableUid]
   ON [dbo].[TestValue] AFTER INSERT
AS
BEGIN
    UPDATE [TV]
        SET [ReadableUid] = dbo.NumericallyCorrectUid([TV].[UID])
        FROM [dbo].[TestValue] AS [TV]
        JOIN inserted AS [I]
            ON [TV].[UID] = [I].[UID]
END
GO

INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
GO

SELECT [RowId], [UID], [ReadableUID]
    FROM [dbo].[TestValue]
    ORDER BY [RowId]

The query at the end shows the UNIQUEIDENTIFIER values in the order that they were inserted. On my computer, they appear as follows. Please understand that on your computer you will get different values. If you didn't, we would have to remove the UNIQUE from the data type's name, wouldn't we?

RowId UID                                   ReadableUID
===== ====================================  ====================================
1     21321236-C387-4F81-83C5-201B3ECCFFC9  201B3ECC-FFC9-83C5-814F-87C336123221
2     4159FB16-F10C-4C03-AABD-6A6BBB092ABA  6A6BBB09-2ABA-AABD-034C-0CF116FB5941
3     0F4F2022-BAB4-411C-B66B-8C63167987B7  8C631679-87B7-B66B-1C41-B4BA22204F0F
4     5F326809-C47A-4149-AAA3-8E3F1C8419A2  8E3F1C84-19A2-AAA3-4941-7AC40968325F
5     31243180-AFB1-427A-A8D9-04EEA9866224  04EEA986-6224-A8D9-7A42-B1AF80312431
6     B1731F5E-13BA-4683-A846-020D7121FDEB  020D7121-FDEB-A846-8346-BA135E1F73B1
7     FABD2006-D8CF-44EE-8774-7D3052FF5A28  7D3052FF-5A28-8774-EE44-CFD80620BDFA
8     A4547257-E3C5-4EEF-ABC0-246D96DAE4A1  246D96DA-E4A1-ABC0-EF4E-C5E3577254A4
9     B03AF9E2-583F-44A6-B99D-169457FFA629  169457FF-A629-B99D-A644-3F58E2F93AB0
10    502FA784-5308-4A33-9F1A-36816860BB61  36816860-BB61-9F1A-334A-085384A72F50

The first thing to notice is that the UNIQUEIDENTIFIERs were inserted in what seems like random order. This is because I used the NEWID() function in my INSERT statements to generate the UNIQUEIDENTIFIER values. If I had used the NEWSEQUENTIALID() function instead, the values would have been in ascending order when sorted by the [RowId]. The second thing to take note of is that comparing the [ReadableID] version of each [UID] reveals the pattern I showed above. Namely, the bytes of the [ReadableID] represented as 00112233-4455-6677-8899-AABBCCDDEEFF show in each related [UID] in the order FFEEDDCC-BBAA-9988-6677-001122334455. Now, let's order the results differently:

SELECT [RowId], [UID], [ReadableUID] FROM [dbo].[TestValue]
    ORDER BY [UID]

RowId UID                                   ReadableUID
===== ====================================  ====================================
6     B1731F5E-13BA-4683-A846-020D7121FDEB  020D7121-FDEB-A846-8346-BA135E1F73B1
5     31243180-AFB1-427A-A8D9-04EEA9866224  04EEA986-6224-A8D9-7A42-B1AF80312431
9     B03AF9E2-583F-44A6-B99D-169457FFA629  169457FF-A629-B99D-A644-3F58E2F93AB0
1     21321236-C387-4F81-83C5-201B3ECCFFC9  201B3ECC-FFC9-83C5-814F-87C336123221
8     A4547257-E3C5-4EEF-ABC0-246D96DAE4A1  246D96DA-E4A1-ABC0-EF4E-C5E3577254A4
10    502FA784-5308-4A33-9F1A-36816860BB61  36816860-BB61-9F1A-334A-085384A72F50
2     4159FB16-F10C-4C03-AABD-6A6BBB092ABA  6A6BBB09-2ABA-AABD-034C-0CF116FB5941
7     FABD2006-D8CF-44EE-8774-7D3052FF5A28  7D3052FF-5A28-8774-EE44-CFD80620BDFA
3     0F4F2022-BAB4-411C-B66B-8C63167987B7  8C631679-87B7-B66B-1C41-B4BA22204F0F
4     5F326809-C47A-4149-AAA3-8E3F1C8419A2  8E3F1C84-19A2-AAA3-4941-7AC40968325F

Zero in on the [RowId] values first. They are out of order because we ordered by the [UID] instead. But glancing at the [UID] and thinking of them as integers, they don't look ordered either. Look at the first digit of the first two rows, digit B (value 11) certainly comes after 3 numerically. So, how could this be ordered properly? Now look at the [ReadableID] column. Lo, and behold, that column appears to be sorted in ascending fashion. But we didn't order by that column so what's going on here? Again, the way we see a UNIQUEIDENTIFIER as human beings and the way SQL treats these really large integers is quite different. And these differences persist even when we humans try to convey UNIQUEIDENTIFIERs as strings in our scripts. For example, if I wanted to use the ordering by [UID] shown in the last query and return the last five rows ([RowId] 10, 2, 7, 3 and 4) by using the greater than or equal operator, I could do it as follows:

SELECT [RowId], [UID], [ReadableUID] FROM [dbo].[TestValue]
    WHERE [UID] >= '502FA784-5308-4A33-9F1A-36816860BB61'
    ORDER BY [UID]

Which would return the last 5 rows of the query shown before as:

RowId UID                                   ReadableUID
===== ====================================  ====================================
10    502FA784-5308-4A33-9F1A-36816860BB61  36816860-BB61-9F1A-334A-085384A72F50
2     4159FB16-F10C-4C03-AABD-6A6BBB092ABA  6A6BBB09-2ABA-AABD-034C-0CF116FB5941
7     FABD2006-D8CF-44EE-8774-7D3052FF5A28  7D3052FF-5A28-8774-EE44-CFD80620BDFA
3     0F4F2022-BAB4-411C-B66B-8C63167987B7  8C631679-87B7-B66B-1C41-B4BA22204F0F
4     5F326809-C47A-4149-AAA3-8E3F1C8419A2  8E3F1C84-19A2-AAA3-4941-7AC40968325F

When using the UNIQUEIDENTIFIER for the first row I want, it's important to note that I have to use SQL Server's numerically significant format of 502FA784-5308-4A33-9F1A-36816860BB61 instead of the readable-as-integer 36816860-BB61-9F1A-334A-085384A72F50. In fact, if I were to try to use the version that allows me to read the value from left to right as an integer instead, I would get very different results. Try it for yourself to see what happens.


Categories: CapTech | Series | Software Dev | SQL
Posted by kevin on Saturday, October 24, 2009 4:27 PM
Permalink | Comments (0) | Post RSSRSS comment feed

PyTip: Avoid Using range() for Large Sequences

When iterating over a sequence of numbers in Python, the range() function is commonly used. However, the implementation of the range() function in Python 2.x instantiates each element in the sequence before the iteration begins. This is really costly from both memory and CPU perspectives when the desired range of numbers is large. Consider using the xrange() function instead which implements a Python generator to yield each number in the sequence as needed. Using xrange() instead of range() for large iterations can have a big, positive impact on your code. For example, in an application I was working on recently, replacing range() calls with xrange() boosted my performance from ~900,000 transactions per second to over 3,000,000. In Python 3.x, the range() function is supposed to be implemented as a generator but I haven't tested that to be true yet. Let me know if you have.


Posted by kevin on Monday, September 21, 2009 7:00 AM
Permalink | Comments (4) | Post RSSRSS comment feed

Exploring the F# Language Series Part 1 - What is F#?

Throughout this series, I will be exploring the F# (pronounced F Sharp) language as a beginner. Perhaps you're just like me in that you've never worked with the F# language before but you are very curious about it. You may not understand the hype you've been hearing about so-called functional languages. But that's OK. If you want to learn along with me, that would be great.

Along the way, I welcome your comments and feedback, both to instruct me and other readers. You can get an overview of the complete series by visiting the series index. Enjoy.

 

Part 1 - What is F#? 

If you are like me, a software developer with a C++/C# background, you may have heard about the F# language but you haven't had a chance to learn anything about it yet. You may also be a Java developer or a Visual Basic developer who's curious. Don't let the name fool you. You don't have to be a C# developer to love F# once you start learning about it. As we'll see, the F# language doesn't have strong roots or ties to languages like C, C++, Java, C# or Visual Basic. As such, I think it can teach us a lot about alternative ways of thinking.

An FAQ (of Sorts) for Defining the F# Language

Here are some questions I've accumulated about the F# language. My answers below aren't definitive, by any means. But this is what I've learned so far. If you have other questions that you think belong in an entry-level FAQ for the F# language, let me know and I'll add them.

  • With a name like that, is F# a derivative of C#?
  • What does the F in the F# language's name mean?
  • What is a functional programming language?
  • What is an imperative programming language?
  • Is F# object-oriented?
  • Can F# be integrated into Visual Studio?
  • Are there command line tools for working with F#?
  • Is F# a portable language?
  • Does F# require me to deploy a runtime library?
  • Is F# a scripting language or a compiled language (or both)?
  • How does F# perform in terms of speed as compared to other languages?
  • Where can I go to find out more about F#?

With a name like that, is F# a derivative of C#?

Not at all. F# comes from a family of languages that were spawned by a language created in the late 1970s known as ML or MetaLanguage. The ML language has a special type inference algorithm built into it that allows it to infer the types of most expressions automatically. This allows the language to feel dynamically typed like many scripting languages while actually being statically typed. F# behaves this way as well. In F#, you can declare types but you don't often need to do it.

More recently, F# is a derivative of a language known as Objective Caml (or OCaml) which extended Caml (Categorical Abstract Machine Language), an ML derivative, by adding certain object-oriented capabilities to it. F# has a high degree of compatibility with OCaml source code as a result.

What does the F in the F# language's name mean?

I'm sure it's debatable but the F in F# mostly likely stands for the term Functional. It's hard to get an exact answer on this one but one can assume that since F# is often touted as a functional language, that must be the meaning. F# is not just a functional language, however. It's a multi-paradigm language that allows for functional programming as well as imperative programming. So, it's conceivable that the language could have been called I#. But the functional feature of F# sort of steal the show so, the name makes sense.

What is a functional programming language?

Words are designed to conjure up ideas and images. So, what does the word functional mean in the context of a programming language. Merriam-Webster's dictionary defines functional as:

1 a: of, connected with, or being a function b: affecting physiological or psychological functions but not organic structure <functional heart disease>
2: used to contribute to the development or maintenance of a larger whole <functional and practical school courses>; also : designed or developed chiefly from the point of view of use <functional clothing>
3: performing or able to perform a regular function

As an American English speaker, the third definition is the one that I'm likely to think of when I hear the word functional. If something is functional, it's performing as it is supposed to. For other English dialects, the term functional could conjure up other meanings. But when I hear the term functional programming language, I'm likely to think of a programming language that's able to do what it was designed to do.

Of course, that makes no sense at all. If programming languages didn't perform as we expected them to, we wouldn't call them programming languages at all. We would call them probability languages or gee-i-sure-hope-this-thing-does-something-useful-when-i-press-the-button languages. Instead, the term functional in the realm of programming languages relates to the use of functions (or callable mathematical parts) as the basis for computation rather than changes in state and the mutation of data. So functional programming languages emphasize the second of the three definitions given above, i.e. viewing computation as the sum of the parts (functions) that are used to describe a larger solution.

In the 1930s, something called the Lambda Calculus was conceived to explore how functional decomposition of a problem could be used to more naturally describe potential solutions to it. Nowadays, functional programming languages are just implementations of the Lambda Calculus as a specific system with some extra bells and whistles, as they say. As functional programming languages go, however, F# is not a pure because it includes constructs for mutability, i.e. changing the state of objects in certain cases.

What is an imperative programming language?

Just as we did for the term functional above, it's probably worth investigating what the word imperative means at this point. Merriam-Webster defines imperative this way:

1 a: of, relating to, or constituting the grammatical mood that expresses the will to influence the behavior of another b: expressive of a command, entreaty, or exhortation c: having power to restrain, control, and direct
2: not to be avoided or evaded : necessary <an imperative duty>

As an American English speaker, both definitions come to mind. Imperatives are about the will and the duty to change things. The term imperative programming language is understandable in this context because imperative programming is all about state management.

To understand how imperative systems are built, think about the modern computer system. From the hardware up, all computers are somewhat imperative in nature. We store memory as a series of electrical signals and manipulate those signals to change the meaning of the data they represent. Registers on the CPUs change value as programs execute to track things like the position of a stack pointer or the location of the next executable instruction in memory. If you think about it, the entire computer system is nothing more than a very large state machine with a googolplex of possible states.

Imperative systems are, by definition, mutable, meaning that they can mutate or change in some way over time to create state that is representative of the progress of the computation being performed. Imperative programming languages, in particular, usually allow the programmer to define global and/or local state through various memory constructs like variables and classes. Because F# is based on OCaml, it has some object-oriented features that depend on mutable data. As we'll see later, the F# keyword mutable is at the heart of the language's imperative programming model.

Is F# object-oriented?

Yes. F# supports the .NET typing model including:

  • Classes
  • Inheritence
  • Interface implementation
  • Polymorphism

All types in F# ultimately derive from the .NET type System.Object so the model is completely unified and compatible with other .NET languages.

Can F# be integrated into Visual Studio?

The installation package for F# includes complete integration with Visual Studio 2003, 2005 and 2008. Included are:

  • A new F# project type (with file extension .fsharpp)
  • Templates for F# source, scripts and interfaces (including ML-compliant ones)
  • Templates for F# Lex and Yacc source
  • Integrated debugger support
  • A tool window for using the command-line tool called FSI.EXE (F# Interactive) for testing and running F# scripts and code

Are there command line tools for working with F#?

Yes. In fact, many people (like me) prefer working with the F# command line tools most of the time. There are F# command line tools in the installer package for:

  • F# compilation (FSC.EXE)
  • F# interactive interpreter (FSI.EXE)
  • F# Lex compiler (FSLEX.EXE)
  • F# Yacc compiler (FSYACC.EXE)
  • Resource compiler (RESXC.EXE)

Is F# a portable language?

Using the #light directive, F# can compile or interpret most OCaml code. So if you use that directive, your F# code should port back to OCaml without much effort. However, dependence on .NET types that are not available on other platforms will, of course, make your F# code non-portable (or at least portable with a lot of effort to replace the missing types).

Does F# require me to deploy a runtime library?

Yes. There is an assembly called FSharp.Core.dll which is referenced by your compiled F# code. A command line flag for the FSC.EXE compiler exists called --standalone. If you use this flag, the compiler will statically link the core components in so that you don't have to deploy the core assembly with your compiled F# assemblies. Using the --standalone flag will add between one and two megabytes to your F# assemblies, though.

Is F# a scripting language or a compiled language (or both)?

F# is definitely a compiled language. But it also supports  scripting via FSX files. You can run FSX scripts at the command line using the --exec flag of the F# Interactive (FSI.EXE) tool. This is useful when you want to run some F# code without having to compile it first.

How does F# perform in terms of speed as compared to other languages?

Compiled F# runs about as fast as C# or C++. I don't have personal benchmarks yet but empirically and anecdotally, compiled F# seems to be about as fast as other compiled .NET languages. The F# compiler has a cross-module optimizer that can be enabled using the -O command line flag. This flag is turned off by default. Of course, without hard performance data, my assessment remains quite subjective. We will examine F# performance in detail later on in the series.

Where can I go to find out more about F#?

Here are some links I've found useful:

That's all for now. Feel free to take a look at the other parts of this series exploring the F# language by visiting the series index.


Categories: F# | Software Dev | Series
Posted by kevin on Sunday, August 03, 2008 12:00 PM
Permalink | Comments (14) | Post RSSRSS comment feed