Thread safety

One of our focuses for .Net 4.5 was on async and improving support for doing ADO.NET asynchronously. A side effect of this is that we did a lot of work improving our thread-safety story. With .Net 4.0 and prior, we simply had a blanket statement that multithreaded access to any ADO.NET object was not supported, except for cancellation (i.e. SqlCommand.Cancel). Even then, there were some unusual corner cases with cancellation that resulted in unexpected circumstances. With 4.5 we could have maintained this stance, but we realized that the use of async makes mistakes with accessing objects on multiple threads much more likely (e.g. when manually calling ContinueWith or if you forget the ‘await’ keyword).

Pending Operations

With the .Net 4.5 Developer Preview, if you try to call any operation on an ADO.NET object while it has an asynchronous operation pending (including the old Begin\End methods), then we throw an InvalidOperationException. While this isn’t the nicest of behaviors, it is much better than the 4.0 and below behavior (which was undefined, although typically resulted in NullReferenceExceptions and data corruption). The reason that we opted for an exception instead of doing something ‘smarter’ (like waiting for the operation to complete), is that a secondary call to an object is typically a coding mistake (e.g. forgetting the ‘await’ keyword) and that anyone who needs to do multiple operations should schedule the second operation via await or ContinueWith.

However, if there is a synchronous operation in progress, we still do not support starting another operation on that object. And, by ‘not supported’, I mean that we have no checks in place and no guarantees on the behavior. Unfortunately, there is no easy way to detect multi-threaded access to an object (even from within a debugger), so you need to make sure that your code is correct. The simplest way to do this is by never sharing an ADO,NET object between multiple threads. This means that if you have a shared ‘Data Access Layer’ in your application, you should be opening a new connection per call (and closing it afterwards) or, if you have something like a singleton logger, you may want to consider a Consumer\Producer pattern such that there is only one thread performing the logging.

Cancellation

As I mentioned previously, cancellation is the only operation that we have always supported from another thread. In .Net 4.5 we have done a lot of work to ensure that cancellation is still supported, and we have also dealt with quite a few of the corner cases. For instance, any time there is a fatal error on a connection (e.g. the network has gone down) then we close the current connection. While this may seem reasonable, it means that cancelling an operation could result in the connection being closed while another operation (i.e. the one being cancelled) was running. While we haven’t changed this behavior in .Net 4.5, we have made sure that any other operation can handle the connection be closed due to an error, even if it means throwing an InvalidOperationException.

Multi-threaded MARS

In SQL Server 2005, we introduced a feature called "Multiple Active Result Sets", or MARS, which allowed multiple commands to be executed on a single connection. In .Net 4.0 and prior this had the caveat that you could not execute multiple commands or use multiple readers simultaneously, which greatly limits the usefulness of MARS. In .Net 4.5 we have done a lot of work to try to enable this scenario for SqlClient and, although we are not yet officially supporting it, it is something that we would like for people to try out as a part of their testing of the Developer Preview and async. As a side note, there is a performance overhead for enabling MARS, so it may be worth also investigating if you can disable the feature instead.

Tagged , , , , , , ,

Issue 14

(Rather than just reiterating the new features in ADO.NET that we announced for //Build/, I figured that I’d do a series of posts covering various features in depth – although this first "feature" shipped a bit earlier than the 4.5 Developer Preview)

What’s in a fix

If you remember last month’s Patch Tuesday, the first Reliability Update for .Net 4.0 was release, including a bug fix for System.Data.dll. However, those of you who read the support article would have been greeted by this cryptic message:


Issue 14
Consider the following scenario

  • You use the .NET Framework Data Provider for SQL Server (SqlClient) to connect to an instance of Microsoft SQL Azure or of Microsoft SQL Server.
  • An established connection is removed from the connection pool.
  • The first request is sent to the server.

In this scenario, an instance of SqlException is encountered, and you receive the following error message:
A transport-level error has occurred when sending the request to the server.


So given that description, can you tell what the original bug was, or what we fixed?
No? Neither can I – and I wrote the fix…

Historical Perspective

To explain "Issue 14", we’ve first got to look back at the history of ADO.NET, back to the 2.0 (or, possibly, 1.0) days. In the original design of the Connection Pool it was decided that, if there was a catastrophic failure of a connection, then the entire pool should be cleared. This was a reasonable assumption, since being unable to communicate with the server typically means that either the server is down (or restarted), the client network connection has died or failover has occurred – in any of these circumstances, it is unlikely that any other connection in the pool had survived.

Fast forward to today, and some of the original assumptions of the connection pool are no longer valid. Due to the increased popularity of cloud computing and connected devices, connections to SQL Servers are might not be going over ultra-fast and ultra-reliable links inside a data center. Instead, they may be going over the internet, which means that they are unreliable and could drop at any time. This, combined with SQL Azure’s policy of dropping connections that have been idle for over 30 minutes, meant that we could have one dead connection in the pool, but the rest would be ok.

Check Connections

So now we’re connecting over unreliable connections and still clearing the pool when it’s possible that only one of the connection had died. On top of that, we don’t know the connection is dead until someone tries to execute a command on the connection (and then gets an error, despite the fact that they had just opened the connection). So what to do?

Firstly, we are now checking the state of the connection when we remove it from the connection pool and, if its dead, giving you a new connection. This greatly decreases the likelihood that you will be trying to execute on a bad connection (although its still possible, as we are relying on Windows to know about the state of the underlying TCP connection, and since there is a race condition between us checking and you executing on the connection).

Secondly, we no longer clear the pool when there is a fatal error on a connection – so we’re no longer dropping (hopefully) good connections just because one connection is bad. Conversely, since we are checking connections before using them, we are still responsive to events like failover or network disconnects.

Best Practices

If you read the last section carefully, you would have noticed one of the caveats of this feature: "there is a race condition between us checking and you executing on the connection", which leads to my first recommendation:

Follow the Open-Execute-Close pattern

Every time you need to interact with SQL Server, you should open a fresh connection, execute your command (and deal with the data reader if you open one) and then close the connection (since SqlConnection, SqlCommand and SqlDataReader are all implement IDisposable, the best way to do this is with a ‘using’ statement). If you need to expose the data reader to a higher level API, and don’t want to cache the data in the data reader, then you should wrap the connection, command and reader inside a new class that implements IDisposable and return that instead.

My second recommendation relates back to my previous post on connection strings:

Use our connection pool

Despite the connection pooling code being rather old, it is extremely fast, reliable and it works. Opening connections from the pool and returning them afterwards is incredibly quick, especially when compared to opening a fresh connection or executing a command on a connection. Additionally we have the ability to introduce features like this which custom pooling code can’t.

Finally, this improvement is no replacement for proper retry code.

Improvements in 4.5

In the .Net 4.5 Developer Preview, we’ve made this feature more scalable, especially for high-throughput application servers where connections do not sit idle in the pool for very long.

As a final note, if you haven’t already upgraded to .Net 4.5, then you should make sure that you’ve installed the 4.0 Reliability Update.

Tagged , , , , , ,

Connection Strings: The smaller, the better

Today I’d like to talk about the wonderful and magic things that are Connection Strings. If you’re not familiar with connection strings, they are the way that a developer informs ADO.NET which server to connect to and connection options to use.

A Simple Rule

The problem with connection strings (actually, there are quite a few problems, but I’ll stick to the point of this post) is that there are far too many options to choose from, but let me simplify everything for you:

If you don’t need to change an option, or don’t know what it does, then don’t specify it.

Default Values

The default values for connection string options are there for a reason, so unless your application has some unusual requirements, you should stick to the default values. However, you shouldn’t re-specify the default values in your connection string either, unless you heavily rely on the behavior provided by that default value. The reason for this is that we may change the default at a later date if we make code changes (which make a different value more optimal), or we introduce a new value that is better for most developers. The most common case I see here is setting the "Max Pool Size" value – for the vast majority of applications the default size of 100 is reasonable, however you shouldn’t specify 100 in case we (or SQL Server) make modifications to our network code and so are able to increase the maximum, or perhaps we’d have different values for client and server applications*. Either way, you’d want to be able to get this benefit for ‘free’ by not specifying a value, rather than having to modify all of your deployed config files with the new values (because, of course, you are using config files, and don’t have the connection string hard-coded in your application).

Optional Extras

Alternatively. the thought "I don’t know what it does, but I might use it later" may also lead to included unneeded connection string options. If there was some beneficial feature that didn’t have any negative side effects, then we would enabled that option be default. The fact that a connection string options is disabled by default should indicate that there is some other side effect of turning it on (typically a performance hit, but possibly other things). If you don’t know what an option does, then you probably aren’t taking advantage of it, and if you aren’t taking advantage of it, then you don’t need it. A good example of this is "Multiple Active Result Sets" (aka MARS). This is a feature introduced in SQL Server 2005 that permits multiple commands to be executed on a single connection simultaneously**. This may sound great, but most applications don’t really have a need for it, they can simply open another connection. However, if you turn it on because you "may need it", then you will be taking a performance hit and possibly hiding errors in your code (since having MARS off ensures that you dispose a SqlDataReader before trying to open a new one on the same connection).

Final Caveat

Before you go ahead and start removing options from your connection strings, there is one thing you should be aware of: since you had these options specified, there may be parts of your code that rely on the non-standard behavior. For instance, if you turn off MARS, any part of your code that created multiple readers on the same connection will start throwing exceptions, or reducing the Max Pool Size may reveal a connection leak that was previously hidden (resulting in more exceptions in your code). So be very careful when changing connection string options and ensure that you run all of your tests (which, of course, you have) and have a rollback strategy to deploy the old connection string if something goes wrong.


*These are just examples and are not necessarily in our current plans. But if you like these ideas, or have some of your own, feel free to post the on Connect
**Technically speaking, we don’t support multithreaded access to the same connection, even with MARS turned on (unless you are cancelling a Command). So, you would still need to synchronize each of the Command\Reader executions\reads on the same connection.

Tagged , , ,

Size of MAX != Max of Size

How’s that for a title?

What I’m actually referring to here is the VAR* data types in SQL Server (i.e. VARBINARY, VARCHAR and NVARCHAR). For these data types you need to specify a maximum size for that column, such as VARBINARY(20) (which would be a binary array that is, at most, 20 bytes long). The largest maximum size permitted is 8000 for VARCHAR and VARBINARY and 4000 for NVARCHAR. You can also specify a size of ‘MAX’ (e.g. VARCHAR(MAX)), however this does not set the maximum size to 8000 or 4000, rather it sets the maximum for the column to 2^31-1 bytes.

Hence, the Size of MAX (2^31-1) != Max of Size (4000 or 8000)

Pick a size, any size

So, the question then becomes “Why not just use MAX for everything?” A few reasons: Firstly is performance, from a connectivity point of view (since that’s where I work), MAX data types need to be sent in chunks, meaning that we need to read additional metadata concerning the size of each chunk (although this is likely to be quite small compared to the total amount of data being sent). From an storage point of view, if the data is larger than 8000 bytes then it is stored “out of row”, meaning that a pointer to the data is stored in the row storage and must be dereferenced in order to read the data. This also means that the query engine* can not simply assume that all of the data it requires is in row storage, nor can it assume that it can load the all of the data from the column into memory (since they may be up to 2Gb of data per MAX column per row).

In terms of maintenance, you can not do online index operations on MAX columns. Additionally, if you have a lot of data that increases over time to be above the 8000 byte limit and is taken “out of row” or shrinks to below 8000 bytes and is taken into the row, then this will greatly increase the amount of fragmentation your database has.

However, the most important reason to limit the size of VAR* columns is for security. For instance, imagine that you are running a website and permit users to create accounts, but you also allow them to change their username once they are registered. You also decide that you will have the ‘username’ column in your database to be NVARCHAR(MAX), and that you will limit the size of the username in your business logic. All of this would be fine, so long as your code is bug free. If, however, you have a bug that allows a user to bypass your business logic and set a username of any size, then it becomes quite easy for a malicious user to stage a denial of service attack on your website – they can simply create a few users with very long usernames and fill up your database (remember that SQL Azure only allows 50Gb database size, which is 25 completely filled MAX columns). If you also have a page that displays usernames (e.g. for high scores, list of users online, searches) then your other users won’t be able to use those pages as they will be attempting to download the attacker’s massive username (and the bandwidth that is used in the process may be costly as well). So, while the correct response to this scenario is to fix the bugs in your website, you should also be following the “Defense in Depth” principle and have protections all the way from client-side scripting through to the business logic and underlying database schema.

There are, however, some places where limiting to 4000 bytes may be unreasonable, for instance blog posts, forums or content stored in a CMS. But, where possible, try to choose an actual size for your VAR* columns.

Note on legacy types

You may also notice that there are the IMAGE, TEXT and NTEXT types in SQL Server, these are legacy types and you should be using VARBINARY, VARCHAR and NVARCHAR types instead.

* I have not seen the engine’s code so I can’t confirm that it does make these assumptions, although there is some evidence to suggest that specifying a size does help performance.

Tagged , , ,

P3SS: POP3 Test Server

As a part of my work to automate testing for P3SS (POP3 to SMTP Server) I have been building a POP3 server that P3SS can run against and I can easily set and inspect the state of as well as introduce errors and ‘bugs’.

I am happy to announce that the source code for the first version of this POP3 test server is available on Codeplex!

This is a basic, but fully RFC 1939 compliant POP3 server that is designed to make unit and functional testing of POP3 clients much easier. At present it supports:

  • Proper TCP support, including custom ports (110 is default)
  • Multiple clients (through multi-threading), with per client:
    • Usernames and passwords
    • Adding mail items and checking if they have been deleted and collected
  • Most POP3 Commands:
    • USER, PASS
    • APOP
    • DELE
    • LIST, STAT
    • NOOP (Note: NOOP is implemented such that it will respond even if the client is not logged in – which is against RFC specs)
    • RETR
    • RSET
    • QUIT
    • UIDL
  • Unit tests for all functionality
  • A simple base on which to build your own test servers (an echo server is also provided)

I am now working to add CAPA and encryption (both implicit and STLS). The addition of encryption may mean that I will have to rebuild part of the networking code, so you may see that the next version uses the Async CTP and (hopefully) does less low-level networking\stream\byte array code. Once the POP3 test server is complete and stabilized, I will move onto a test SMTP server, then add abilities to introduce errors and non-compliance into both (while also adding some tests for P3SS in between).

In the mean time, the code is available for one and all to use! And let me know (either on Codeplex or via email) if you have any feedback or suggestions.

When in doubt, try the other port

First off – sorry for not posting in 6 months… I was somewhat preoccupied with finishing up my honors and starting my new job at Microsoft. As you may have noticed by my spelling, I’m now in the US and have settled in quite well. But enough about me, on to the blog post!

So, last Tuesday was Patch Tuesday and, lo and behold, my laptop couldn’t connect to Exchange, and my phone hadn’t synced since 3am that morning. Since I was foolish enough to have my server set to automatically install updates (which, by default, happens at 3am) I immediately knew what was wrong. Logging into the server confirmed my suspicions – the server had installed updates and now the Exchange Information Store service was stuck in the ‘starting’ state. I hit reset on the server, blocked my ears as the fans revved to maximum speed while the machine booted and waited. And waited. And waited. Something had gone horribly wrong – the hard drive light had stopped, but I couldn’t remote in; I tried plugging in my monitor – but it simply said “Mode not compatible: 74.9Hz”, most likely because the server hadn’t booted with the monitor connected. I hit the reset button and waited again, this time watching the boot sequence. Two things caused alarm bells to ring in my head – firstly the RAID status was “Verify” not “Optimal” and, secondly, Windows start up never got past “Applying computer settings” (with the “Donut of Doom” spinning ominously to the left of it).

My first reaction was to restore from the last backup – I had nightly backups going and Server 2K8 R2 makes restoring from backups (even onto ‘bare metal’) extremely easy. I rolled back to the last backup but, to my horror, Windows was still stuck at “Applying computer settings”. I tried backups that were even older, but to no avail – the server wouldn’t boot. By this time the fact that the RAID status was still “Verify” had gone from a curiosity in my mind to a possible cause – so I broke the array and attempted to restore onto a single disc, with that failing I recreated the array and tried again. Still nothing. At this point I was pretty desperate, I couldn’t remote in or physically log in, so I pulled out the debugging tools.

First up was the old faithful, the most useful tool that ships with Windows and is severely underrated and probably under used – Event Viewer. Even though the server hadn’t booted properly, and my laptop wasn’t on the same domain as the server, Event Viewer still managed to connect to the server. What I saw in the logs (past all of the spam that P3SS generates – I really need to fix that…) is that the MSExchange ADAccess was having 2102 and 2114 events, indicating that it couldn’t find the AD server – which was especially weird since the AD server WAS the Exchange server… But things began to make more sense now.

The “Applying computer settings” part of Windows start up is when a large majority of services running in Windows start. If these services hang then the machine gets ‘stuck’ in this screen. Except that these services should never hang because the Service Monitor is supposed to kill them after 30 seconds. But how do you kill a service that is in the process of “stopping” and refuses to respond? Exchange not finding the AD service meant that either AD or DNS was not running properly. Since I had to provide domain credentials to Event Viewer in order to log into to the server, AD must have been ok – therefore the DNS server had borked. And the easiest way to unbork a DNS server? Yank the network cable.

I pulled out the network cable from the server, and nothing happened. Damn. Perhaps the DNS services was still trying to bind to the static IP of the disconnected network card? So I plugged the cable into the secondary network card (which had a dynamic IP) and… the server finally booted! Logging in I found my theory confirmed again – the DNS service and a number of  Exchange services that were set to “Automatic” had not started. I started the services, and everything was running as smooth as butter – so I reset the server. And it got stuck again. A quick switch of network ports and kicking off some services, and we were back in business.

So there you have it – when in doubt, try the other network port!

Addendum: Binging “DSC_E_NO_SUITABLE_CDC” has resulted in a few things to try, including enabling IPv6 (which I don’t really want to do, as it tends to break Outlook Anywhere) and adding the Exchange Server to the “Domain Admins” group… I’ll try some of these over the weekend and let you all know how it goes!

Update: It looks like Travis Wright on the TechNet forums had the correct answer – if you had IPv6 enabled when you installed Exchange 2010, it needs to remain enabled. (In hindsight, this makes sense, as it may have been possible that the DNS service was trying to bind to a non-existent IPv6 address and that the Exchange AD Topology service may have been looking for AD on the IPv6 loopback address, and connecting the cable to the secondary card worked because that secondary card still had IPv6 enabled)

Tagged , ,

The Life and Times of IE6

I came across an awesome comic strip describing how IE6 came into being, why it wasn’t upgraded and, most importantly, can we now get rid of it?

http://www.smashingmagazine.com/2010/02/11/the-life-times-and-death-of-internet-explorer-6-comic-strip/

Tagged ,

Server Core can’t have SQL Server Installed?

Today I needed a testing machine for an ASP.NET MVC application that I am building, so I started up the Hyper-V server and created a new virtual machine. When it came to choosing the version of Server 2008 R2 to install, I saw the option for “Windows Web Server (Core Edition)” and thought, “Brilliant! A nice, light-weight version of Windows Server that won’t eat too much RAM from the Host server” and, unlike the 2008 Core Editions, 2008 R2 Core Editions support .NET and ASP.NET (albeit a cut-down version).

And this is where things went down-hill. As it turns out, SQL Server 2008 is not supported on the Core Editions. Nor is the SQL 2008 R2 CTP supported either. This is *very* surprising as I can’t imagine anything in SQL Server that would require a UI (other than the management console – but that can be installed on another computer); and it also seemed logical that any version of “Windows Web Server” should be able to support a proper, database-driven web application.

The weirdest thing is that, with a bit of a workaround, you can get SQL Server working on Server 2008 Core Edition – so it looks like its just an installer\support issue rather than a technical one – hopefully SQL 2008 R2 will actually support Core Editions when it RTM’s (There is already an item on Connect asking for SQL to support Core editions).

Update: Looks like SQL Server “Denali” can be installed on Windows Server Core (2008 R2 SP1 and above)

Tagged , , , ,

Easter Egg: God Mode in Windows 7

It’s nice to know that the guys at Microsoft still have a sense of humour: How to Enable Godmode on Windows 7

Tagged , ,

A follow-up for the <input type="file"> issue

So it appears that Firefox has had a bug submitted for “Improved form upload manager/progress display“… since 2004. So, if you use Firefox, please log into bugzilla and vote for this bug.

Alternatively, if you are an IE fan, please log into Connect and vote up the bug that I just submitted.

Tagged , , , , ,