Tag Archives: sql

Improving async performance for SqlDataReader

In addition to the Idle Connection Resiliency feature that was added in .NET 4.5.1, the other major “feature” was a massive boost in async performance for SqlDataReader. From our own internal testing we have seen from a 50% performance improvement (for default command behavior) to 180% (for sequential access). This brings async performance to within 20-30% for the speed of sync methods, except that our performance tests are using the async methods wrong – and using it the right way can make async faster than sync.

Making the right calls

In our internal performance testing when we test our async methods, we use async for everything – opening the connection, executing the command, reading rows and get the value for each column. In the official post I wrote for .NET 4.5 I had said that this was not recommended since calling ReadAsync in default command behavior buffers the entire row into memory, so subsequent async calls to read column values is a waste of CPU. Just by switching the calls from GetFieldValueAsync to GetFieldValue I was able to get our performance test running faster with async calls than sync.

This greatly simplifies the rules that I had previously written – always call NextResultAsync, ReadAsync and either call GetFieldValue for default command behavior or GetFieldValueAsync for sequential access

Checking ahead

If you have a look at the code for ReadAsync you can see that there is a lot of “infrastructure” that needs to be set up in order to do an async call, and yet it will never be used if all of the data to read that row is currently available. So, in .NET 4.5.1, we introduced an internal method called WillHaveEnoughData that checks if we are guaranteed to have enough data in the current buffer to satisfy the next request (be it reading a header, column or entire row). In order to do this, we have to make a few assumptions:

Full speed ahead

If you read through the code of WillHaveEnoughData you can see the full set of assumptions, checks and optimizations that have been made. To summarize the code, the way to get the best performance out of the improvements made to SqlDataReader is to ensure the following:

  • Use SQL Server 2008 R2 or later – this introduced a feature called Null Bitmap Compression (NBC) which allows the server to specify which columns are null for a row in the row header instead of setting the column to null in the column’s header.
  • Avoid using NTEXT, TEXT, IMAGE, TVP, UDT, XML, [N]VARCHAR(MAX) and VARBINARY(MAX) – the maximum data size for these types is so large that it is very unusual (or even impossible) that they would happen to be able to fit within a single packet.
  • Keep the maximum size of variable column as small as possible – as I mention above, we assume that variable sized columns will be the maximum size permitted by that column (so a VARCHAR(50) is assumed to always be 50 characters). This can make a huge difference in your applications performance – reducing a column from a VARCHAR(8000) to a VARCHAR(250) or VARCHAR(50) can be the difference between always creating and disposing the async infrastructure and rarely creating it.
  • If you are moving large amounts of data, consider increasing the Packet Size specified in the connection string (NOTE: This will increase the amount of memory required for each connection).
  • For sequential access, make sure to read each column in its entirety before moving to the next column (or next row), especially when reading large columns (otherwise the check for the amount of data needed will include the amount of data left for the current column and, for [N]VARCHAR\VARBINARY(MAX) types, if there is any leftover data at all then we will assume that it will not fit in a single packet)

Hopefully this gives you a good idea of the performance improvements made in .NET 4.5.1 and how to get the most out of them.

Advertisements
Tagged , , , , , ,

Size of MAX != Max of Size

How’s that for a title?

What I’m actually referring to here is the VAR* data types in SQL Server (i.e. VARBINARY, VARCHAR and NVARCHAR). For these data types you need to specify a maximum size for that column, such as VARBINARY(20) (which would be a binary array that is, at most, 20 bytes long). The largest maximum size permitted is 8000 for VARCHAR and VARBINARY and 4000 for NVARCHAR. You can also specify a size of ‘MAX’ (e.g. VARCHAR(MAX)), however this does not set the maximum size to 8000 or 4000, rather it sets the maximum for the column to 2^31-1 bytes.

Hence, the Size of MAX (2^31-1) != Max of Size (4000 or 8000)

Pick a size, any size

So, the question then becomes “Why not just use MAX for everything?” A few reasons: Firstly is performance, from a connectivity point of view (since that’s where I work), MAX data types need to be sent in chunks, meaning that we need to read additional metadata concerning the size of each chunk (although this is likely to be quite small compared to the total amount of data being sent). From an storage point of view, if the data is larger than 8000 bytes then it is stored “out of row”, meaning that a pointer to the data is stored in the row storage and must be dereferenced in order to read the data. This also means that the query engine* can not simply assume that all of the data it requires is in row storage, nor can it assume that it can load the all of the data from the column into memory (since they may be up to 2Gb of data per MAX column per row).

In terms of maintenance, you can not do online index operations on MAX columns. Additionally, if you have a lot of data that increases over time to be above the 8000 byte limit and is taken “out of row” or shrinks to below 8000 bytes and is taken into the row, then this will greatly increase the amount of fragmentation your database has.

However, the most important reason to limit the size of VAR* columns is for security. For instance, imagine that you are running a website and permit users to create accounts, but you also allow them to change their username once they are registered. You also decide that you will have the ‘username’ column in your database to be NVARCHAR(MAX), and that you will limit the size of the username in your business logic. All of this would be fine, so long as your code is bug free. If, however, you have a bug that allows a user to bypass your business logic and set a username of any size, then it becomes quite easy for a malicious user to stage a denial of service attack on your website – they can simply create a few users with very long usernames and fill up your database (remember that SQL Azure only allows 50Gb database size, which is 25 completely filled MAX columns). If you also have a page that displays usernames (e.g. for high scores, list of users online, searches) then your other users won’t be able to use those pages as they will be attempting to download the attacker’s massive username (and the bandwidth that is used in the process may be costly as well). So, while the correct response to this scenario is to fix the bugs in your website, you should also be following the “Defense in Depth” principle and have protections all the way from client-side scripting through to the business logic and underlying database schema.

There are, however, some places where limiting to 4000 bytes may be unreasonable, for instance blog posts, forums or content stored in a CMS. But, where possible, try to choose an actual size for your VAR* columns.

Note on legacy types

You may also notice that there are the IMAGE, TEXT and NTEXT types in SQL Server, these are legacy types and you should be using VARBINARY, VARCHAR and NVARCHAR types instead.

* I have not seen the engine’s code so I can’t confirm that it does make these assumptions, although there is some evidence to suggest that specifying a size does help performance.

Tagged , , ,

Server Core can’t have SQL Server Installed?

Today I needed a testing machine for an ASP.NET MVC application that I am building, so I started up the Hyper-V server and created a new virtual machine. When it came to choosing the version of Server 2008 R2 to install, I saw the option for “Windows Web Server (Core Edition)” and thought, “Brilliant! A nice, light-weight version of Windows Server that won’t eat too much RAM from the Host server” and, unlike the 2008 Core Editions, 2008 R2 Core Editions support .NET and ASP.NET (albeit a cut-down version).

And this is where things went down-hill. As it turns out, SQL Server 2008 is not supported on the Core Editions. Nor is the SQL 2008 R2 CTP supported either. This is *very* surprising as I can’t imagine anything in SQL Server that would require a UI (other than the management console – but that can be installed on another computer); and it also seemed logical that any version of “Windows Web Server” should be able to support a proper, database-driven web application.

The weirdest thing is that, with a bit of a workaround, you can get SQL Server working on Server 2008 Core Edition – so it looks like its just an installer\support issue rather than a technical one – hopefully SQL 2008 R2 will actually support Core Editions when it RTM’s (There is already an item on Connect asking for SQL to support Core editions).

Update: Looks like SQL Server “Denali” can be installed on Windows Server Core (2008 R2 SP1 and above)

Tagged , , , ,