Category Archives: .NET

Improving async performance for SqlDataReader

In addition to the Idle Connection Resiliency feature that was added in .NET 4.5.1, the other major “feature” was a massive boost in async performance for SqlDataReader. From our own internal testing we have seen from a 50% performance improvement (for default command behavior) to 180% (for sequential access). This brings async performance to within 20-30% for the speed of sync methods, except that our performance tests are using the async methods wrong – and using it the right way can make async faster than sync.

Making the right calls

In our internal performance testing when we test our async methods, we use async for everything – opening the connection, executing the command, reading rows and get the value for each column. In the official post I wrote for .NET 4.5 I had said that this was not recommended since calling ReadAsync in default command behavior buffers the entire row into memory, so subsequent async calls to read column values is a waste of CPU. Just by switching the calls from GetFieldValueAsync to GetFieldValue I was able to get our performance test running faster with async calls than sync.

This greatly simplifies the rules that I had previously written – always call NextResultAsync, ReadAsync and either call GetFieldValue for default command behavior or GetFieldValueAsync for sequential access

Checking ahead

If you have a look at the code for ReadAsync you can see that there is a lot of “infrastructure” that needs to be set up in order to do an async call, and yet it will never be used if all of the data to read that row is currently available. So, in .NET 4.5.1, we introduced an internal method called WillHaveEnoughData that checks if we are guaranteed to have enough data in the current buffer to satisfy the next request (be it reading a header, column or entire row). In order to do this, we have to make a few assumptions:

Full speed ahead

If you read through the code of WillHaveEnoughData you can see the full set of assumptions, checks and optimizations that have been made. To summarize the code, the way to get the best performance out of the improvements made to SqlDataReader is to ensure the following:

  • Use SQL Server 2008 R2 or later – this introduced a feature called Null Bitmap Compression (NBC) which allows the server to specify which columns are null for a row in the row header instead of setting the column to null in the column’s header.
  • Avoid using NTEXT, TEXT, IMAGE, TVP, UDT, XML, [N]VARCHAR(MAX) and VARBINARY(MAX) – the maximum data size for these types is so large that it is very unusual (or even impossible) that they would happen to be able to fit within a single packet.
  • Keep the maximum size of variable column as small as possible – as I mention above, we assume that variable sized columns will be the maximum size permitted by that column (so a VARCHAR(50) is assumed to always be 50 characters). This can make a huge difference in your applications performance – reducing a column from a VARCHAR(8000) to a VARCHAR(250) or VARCHAR(50) can be the difference between always creating and disposing the async infrastructure and rarely creating it.
  • If you are moving large amounts of data, consider increasing the Packet Size specified in the connection string (NOTE: This will increase the amount of memory required for each connection).
  • For sequential access, make sure to read each column in its entirety before moving to the next column (or next row), especially when reading large columns (otherwise the check for the amount of data needed will include the amount of data left for the current column and, for [N]VARCHAR\VARBINARY(MAX) types, if there is any leftover data at all then we will assume that it will not fit in a single packet)

Hopefully this gives you a good idea of the performance improvements made in .NET 4.5.1 and how to get the most out of them.

Advertisements
Tagged , , , , , ,

Connection Resiliency in ADO.Net

Despite “reliability” being one of the core features of TCP, network connections are typically unreliable, especially when being routed via the internet. This is even more of a problem with cloud computing, since most connections do route via the internet, and backend machines can go down at any moment for any reason. With this in mind, we have been making changes to ADO.Net to adjust from its client\server origins (with always-up enterprise hardware) to the cloud world (with commodity hardware).

Connection Pool Resiliency

In .Net 4.0 Reliability Update 1 we added a new feature referred to as “Connection Pool Resiliency” (or, as I liked to call it, “Check Connections in the Pool”). This feature would ensure that connections being retrieved from the connection pool were still alive before handing them back to the caller. We achieved this by asking TCP what the state of the connection was and, if TCP reported the connection to be disconnected, then we would create a new physical connection and return that to the caller.

Ensuring that only live connections were returned from the pool covered most low-traffic scenarios, since connections typically spend most of their time in the pool rather than being used. However, it provides little assistance for high-traffic scenarios (where connections do not remain in the pool for very long) or for code where connections are held open for a long time (which we strongly recommend against – but is inevitable in applications like SQL Server Management Studio).

One thing to be aware of for this type of resiliency is that we can only detect failures in the physical connection – if the connection’s state has become corrupt (e.g. commands always time out), then we will not be able to detect it and so we can not recover the connection. (In situations like these you can either abandon the pool by changing your connection string, or take the nuclear option and call ClearPool)

Idle Connection Resiliency

ADO.Net 4.5.1 combined with Azure SQL Database or SQL Server 2014 introduced a new feature to provide resiliency for more scenarios: Idle Connection Resiliency. With this feature, we can recover connections even if they were opened at the time that they failed. One of the difficulties with doing this is that SQL connections (unlike, say, a HTTP connection) have state – you may have changed the database you were connected to, or the collation that you were using, or the level of transaction isolation. Either way, when the connection is recovered, we need to ensure that it resumes whatever state it had previously (i.e. it should be completely transparent that you are now using a new physical connection). To do this we introduced a new TDS token called SESSIONSTATE (to allow the server to hand the client its current state as a blob) and a new Feature Extension called SESSIONRECOVERY (to resume the state on a recovered connection).

Let’s assume that you’ve attempted to execute a command on a dead connection. The process to recover it would look like this:

  1. Execute* is called, check if the connection is alive (using a similar mechanism as the Connection Pool Resiliency)
  2. If the connection is dead, then check if it is recoverable from both client and server perspective
  3. If recoverable, dispose the old connection and attempt to open a new connection with the state of the old connection – we will retry a given number of times and sleep between each retry for a given time
  4. Connection is recovered, resume the normal process

There are a few things to note from this. Firstly, the entire process is async – even if you are executing synchronously, we run the process as sync-over-async. Secondly, there are a number of situations where we can not recover the connection. From the client side, the connection must be idle – for most connections this is guaranteed since we perform the check just before execution, but for MARS connections, it is possible that there is another open session (it doesn’t have to be actively in use by another thread – just being open is enough). From the server side there are a number of situations where the state will not fit, or can not be represented in the SESSIONSTATE blob, for instance a temporary table.

Retry Logic

Even though we continue to improve ADO.Net’s ability to handle unreliable connections, this does not eliminate the need to have your own retry logic in your code – our connection recoveries are best effort and can only detect a limited set of faults. If you want a Microsoft supported implementation of generic retry logic, then I would recommend the Transient Fault Handing Application Block.

Bonus chatter: .Net 4.5.1 is now being pushed out via Windows Update, or you can grab it straight from the Download Center (if you’re running Windows 8.1 – and why wouldn’t you be? – then you already have 4.5.1).

I also added used a few links above to the revamped .Net Reference Source site – this contains the managed source code for most of .Net (including System.Data.dll).

Tagged , , , , , , , ,

GetType() is a Scalability Issue

TLDR; Avoid SafeHandles and Native\Managed transitions in scale-up applications (e.g. web servers, middle tiers), such as calls to GetType(). Additionally, you won’t see these types of performance issues until your application has high loads (i.e. when the Garbage Collector kicks in).

All the CPUs!

Recently, the Entity Framework team was investigating a customer’s performance issue, namely that they couldn’t get 100% CPU usage using EF on multiple threads, no matter how many threads they were using. The EF team had verified that the customer was using threads correctly (avoiding shared state, especially DbContext objects) and that the application was not being limited by IO (the test was run against a local SQL Server using a small enough data set that the results would be cached). They gathered some profiles from the customer, and saw that a large amount of time was spent waiting on Critical Sections in System.Data.SqlClient, so they enlisted my help.

As it turned out, the Critical Section was a red herring – a side effect of the way that we implemented synchronous network calls for MARS. However, there was another event that a lot of threads were waiting on which appeared to be reset when Garbage Collection started and set when it completed. Why were these threads waiting for the GC to complete? Because they wanted to disable being preempted by the GC and you can’t do that while the GC is running.

Clearing the table while the customers are still eating

In a managed application the Garbage Collector deals with memory management, however it needs to be able to do this while the program is running in a thread-safe manner. The typical solution to this is to simply have the GC pause all managed threads in the application while it collects and then resume them afterwards. The key in that above sentence is “managed thread”, there are times in a managed threads life that it has to delve into native code, such as when using PInvoke or using Managed C++ to call into Native C++ (as is the case for SqlClient). In this situation the thread needs to tell the runtime that it is switching to being a native thread, and a side effect of this is that the thread can no longer be preempted by the GC.

There is another situation where we don’t want the GC to interrupt us, and that is during a call to SafeHandle.ReleaseHandle(). The documentation for this method states “The garbage collector guarantees… that the method will not be interrupted while it is in progress”, and one of the things that can interrupt it is the GC itself.

Getting to GetType()

Walking further up the call stack, one of the methods that was disabling GC preemption were network read\write calls in SqlClient. However, the default packet size is 8Kb, so that should be enough data to keep the CPU busy between calls – we also reuse buffers, so there isn’t much stress on the GC. So these calls weren’t likely to be the issue.

The other call that kept appearing was GetType() – this was a surprise since I didn’t see why this method needed to disable the GC. As it turns out, every time GetType() is called it creates and disposes a SafeTypeNameParserHandle, which inherits from SafeHandle. Furthermore, there a number of calls to native methods and memory pinning. Put together, this thread puts a lot of stress on the GC and keeps disabling\enabling GC preemption. Even worse, as the customer added more threads, the GC was stressed more, and so the threads had to wait longer for the GC to finish.

Getting past GetType()

When checking type compatibility, there is no need to call GetType() – the as and is keywords are much faster (roughly 20x and 5x respectively, for single-threaded performance), they also take care of inheritance and interfaces. There are only two reasons why you may need to use GetType(): either you are checking that an object exactly matches a given type, or to see if an object has a particular method. For the first of these scenarios, why does it matter that an object is a specific type, or a sub-class? I honestly can’t think of a scenario where it would truly matter, unless you don’t trust the derived class to conform correctly to the contracts set up by the parent, which is a very unusual situation to be in. As for checking the existence of a method on an object, using reflection is probably a bad idea since the same named method doesn’t necessarily have the same purpose, consider Stream.Read vs DbDataReader.Read vs Interlocked.Read; or even more confusing Type.GetType vs Type.GetType – it is a much better practice to cast (via the as keyword) to the interface or class that has the method that you are looking for and use the casted object.

Worst case scenario, if you really can’t avoid it, then at least try to cache and reuse the result of GetType.

(As a side note, if you’re ever interested in seeing how the .Net Framework works, you can download and browse through the Reference Source)

Tagged , , , , ,

Layers of Abstraction

One of the questions I often hear when I tell someone that I work on ADO.NET is “Is that still alive?” And, while ADO may be long gone, ADO.NET is today one of the most popular ways to connect to SQL Server and is the underlying layer beneath many (if not all) .Net Object Relational Mappers (ORMs) like Entity Framework (EF). However, it is because of these ORMs that you never hear about ADO.NET – it is no longer the “new, sexy” thing (and we don’t have product names like “Magical Unicorn Edition”). This does not mean that ADO.NET is simply becoming a framework to build ORMs, the world is a bit more complicated than that…

One thing that ORMs do well is to allow developers to quickly and easily get a database up and running, and then to get access to the data that is sitting there. For instance, the tooling in EF allows you to quickly create a model, deploy that to a database, code some LINQ (with its IntelliSense goodness) to grab some data, and then manipulate that data as an object. ORMs also make prototyping and rapid development easier as the model they generate can be updated from the database and any changes in the database schema will then show as compiler errors since the corresponding object in that model has now changed as well. One of the problems with EF, and ORMs in general, is that the extra layer of abstraction adds a performance cost, and while EF has been working to reduce that, you can see from the graph below that hand-coded ADO.NET is vastly quicker – so, what to do?

Opportunity Cost

As with any technology choice, you need to carefully considered what each option offers, and what each options costs. EF (and ORMs) offer quick and easy coding at the cost of the application’s performance. For some projects this may be fine – the developers may be too busy to consider hand-coding, or the cost of additional servers may be much less that finding, hiring and onboarding additional developers. Alternatively, ADO.NET may be chosen if the application is large enough that any percentage performance increase is a significant cost savings, or that the turn around time for a single database transaction is critical and every microsecond counts. Just to emphasize the point – remember that ADO.NET is an abstraction from the TDS protocol. There may be developers for which they performance is ultra-critical and they would rather hand-code TDS packets and communicate with SQL Server that way (which is entirely possible, since TDS is a documented protocol).

At the end of the day, you need to choose what makes sense to you and to your business plan – whether that means hand-coding ADO.NET, utilizing an ORM or a mix in between (like starting with an ORM and then hand-coding performance critical queries).

(As a side note: .Net 4.5 Beta has arrived! And ADO.NET has some new features that I will be posting about soon – in the meantime download the Beta and get coding, and be sure to log any bugs or suggestions on the Connect site or the Forums)

Tagged , , ,

SpotTheDefect[0].Answer[2]

Over the past few weeks I’ve been covering various ways to fix a race condition bug. Answer 0 was a very basic solution where we were using a lock to serialize access, and with Answer 1 we significantly improved the performance without sacrificing safety by using the “Copy, Check, Continue” pattern.

However, what you may have noticed from the code is I had a variable called _disposed that I was using it to indicate that we have nulled out _foo – but what if _foo was IDisposable? And we had to make sure that we don’t call any method on it after we disposed it?

But where is the fun in that?

The most obvious thing that we could do is to simply use Answer 0 – by putting locks around the  usage of _foo we can be certain that _disposed accurately reflects if _foo has been disposed or not, but again this is a trivial solution that has a performance cost.

An alternative is to implement Answer 1 and then to catch any ObjectDisposedExceptions that are thrown. This allows us to be “lazy” with our thread safety, and so get back some performance. The problem with this solution is that we can’t be sure that using _foo will throw an ObjectDisposedException: it is entirely possible that _foo will null out one of its fields and we would end up with another race condition similar to the one we were originally trying to solve; or it may just throw the wrong exception type (for example, using a closed\disposed SqlConnection will result in an InvalidOperationException).

Teamwork is key

What makes Answer 0 safer compared to Answer 1 is that both threads are aware of what each other are doing, and they are coordinating their actions. If we were simply concerned that access to _foo would be serialized under a lock, then we could switch to a ReaderWriterLockSlim, and have Thread1 obtain a ReadLock and Thread2 obtain the WriteLock. However, if there are few threads trying to access _foo at once, then you’ll probably find that an uncontested lock provides better performance than the ReaderWriterLockSlim.

An alternative to this is to implement similar mechanics to a ReaderWriterLockSlim but using faster primitives without any locks. Firstly, there are two things our ReaderWriterLockSlim alternative needs to keep track of: the number of readers in the reader lock, and if the writer lock is in used – this suggests that we will need an integer for the readers, but a bool for the writer. Secondly, if the writer lock is held, then no readers may enter the lock; but the writer can not do its work until all readers have finished reading. So, putting this together, we already have our writer lock bool (_disposed), and we can introduce the int for the readers (_activeReaders). Readers must then Increment _activeReaders when they start, and decrement it when they end – but must not do any work if _disposed is true. Similarly, the writer set _disposed to true when it starts and then waits for the readers to finish (i.e. _activeReaders becomes 0).

There are a couple things that you should note about this solution:

  • We expect there to only ever be one thread “writing” at a time (otherwise you need to synchronize access to _disposed – or switch it for an integer, increment it and then only disposed if it equals 1)
  • Once we have “written” no other locks can be taken again
  • I’ve used Interlocked.Increment\Decrement to manipulate _activeReaders (this is because incrementing\decrementing is not guaranteed to be atomic)
  • I’ve had to disable warning CS0420: “a reference to a volatile field will not be treated as volatile”, but it is safe to do so because the Interlocked APIs are “volatile aware”.

So here is the revised solution:

using System;
using System.Threading;

namespace SpotTheDefect1
{
    class Program
    {
        private static Foo _foo = new Foo();
        private static volatile bool _disposed = false;
        private static volatile int _activeReaders = 0;

        static void Main(string[] args)
        {
            Thread thread1 = new Thread(Thread1);
            Thread thread2 = new Thread(Thread2);

            thread1.Start();
            thread2.Start();

            thread1.Join();
            thread2.Join();
        }

        private static void Thread1()
        {
            // Check first to avoid unneccesary work
            if (!_disposed)
            {
                // Warning CS0420: a reference to a volatile field will not be treated as volatile
                // We can safely ignore this because the Interlocked APIs are volatile aware
                #pragma warning disable 0420
                Interlocked.Increment(ref _activeReaders);
                #pragma warning restore 0420

                try
                {
                    // Check again in case we were disposed after doing the increment
                    if (!_disposed)
                    {
                        _foo.Bar();
                    }
                }
                finally
                {
                    // Warning CS0420: a reference to a volatile field will not be treated as volatile
                    #pragma warning disable 0420
                    Interlocked.Decrement(ref _activeReaders);
                    #pragma warning restore 0420
                }
            }
        }

        private static void Thread2()
        {
            Console.WriteLine("Disposing");

            // Indicate that we are disposing and then wait for readers to complete
            _disposed = true;
            SpinWait.SpinUntil(() => _activeReaders == 0);

            _foo.Dispose();
            _foo = null;
        }

        private class Foo : IDisposable
        {
            public void Bar()
            {
                Console.WriteLine("Hello, World!");
            }

            public void Dispose()
            {
                Console.WriteLine("Foo has been disposed");
            }
        }
    }
}
Tagged , , ,

Getting the disassembly and IL for a Jitted\NGENed .Net method using WinDbg and SOS.dll

If you didn’t understand the title, then this post isn’t for you.
If you think you understood, and you think that this may help you with your debugging – then turn back now, you’ve gone completely the wrong way.

I’ve put this disclaimer here since, unless you are interested in how the JIT works (in which case, skip this and read on!), the only reason you are getting the assembly from the JIT is because you believe that the JIT compiled something incorrectly (or, at least, that was why I was investigating this – although it turned out that we were hitting an issue due to Out of Order Execution which lead to a race condition which had a window of a couple of CPU instructions – which just goes to show how awesome our stress test is).

As a note (if you want to try this at home) this uses the SpotTheDefect[0] code, which has been modified to initialize _foo to null (ensuring that the application crashed) and I’m trying to get the IL and disassembly for the Thread1 method

So, first off, we need to load our good friend SOS:

0:004> .loadby sos clr

Then we can find the method table for the class that contains the method we want. In my case, I’m looking for the Program class which is under the SpotTheDefect1 namespace in the SpotTheDefect1 assembly:

0:004> !name2ee SpotTheDefect1!SpotTheDefect1.Program
Module:      000007fa66f82f80
Assembly:    SpotTheDefect1.exe
Token:       0000000002000002
MethodTable: 000007fa66f838a8
EEClass:     000007fa67092240
Name:        SpotTheDefect1.Program

Once we’ve gotten the address to the method table, we can then dump that out specifying the ‘-md’ option to get the addresses of the methods in that class:

0:004> !dumpmt -md 000007fa66f838a8
EEClass:         000007fa67092240
Module:          000007fa66f82f80
Name:            SpotTheDefect1.Program
mdToken:         0000000002000002
File:            C:\Users\Daniel\Documents\Visual Studio 11\Projects\SpotTheDefect1\SpotTheDefect1\bin\Debug\SpotTheDefect1.exe
BaseSize:        0x18
ComponentSize:   0x0
Slots in VTable: 9
Number of IFaces in IFaceMap: 0
————————————–
MethodDesc Table
           Entry       MethodDesc    JIT Name
000007fac557a7c0 000007fac52037a0 PreJIT System.Object.ToString()
000007fac5624cb0 000007fac52037a8 PreJIT System.Object.Equals(System.Object)
000007fac56247a0 000007fac52037d0 PreJIT System.Object.GetHashCode()
000007fac5587420 000007fac52037e8 PreJIT System.Object.Finalize()
000007fa670a0090 000007fa66f838a0    JIT SpotTheDefect1.Program..cctor()
000007fa66f8c038 000007fa66f83898   NONE SpotTheDefect1.Program..ctor()
000007fa670a00e0 000007fa66f83868    JIT SpotTheDefect1.Program.Main(System.String[])
000007fa670a0270 000007fa66f83878    JIT SpotTheDefect1.Program.Thread1()
000007fa66f8c030 000007fa66f83888   NONE SpotTheDefect1.Program.Thread2()

You will notice that the methods are marked wither “JIT”, “PreJIT” or “NONE” – this indicates if the JIT has compiled the method (“JIT”), if NGEN compiled the method ahead of time (“PreJIT”) or if the method is yet to be compiled (“NONE”). I’m interested in Thread1, which has been compiled by the JIT, so now I can dump out the method descriptor (I’ll also show later on what happens if you try to get a method that is yet to be compiled).

0:004> !dumpmd 000007fa66f83878   
Method Name:  SpotTheDefect1.Program.Thread1()
Class:        000007fa67092240
MethodTable:  000007fa66f838a8
mdToken:      0000000006000002
Module:       000007fa66f82f80
IsJitted:     yes
CodeAddr:     000007fa670a0270
Transparency: Critical

From here I now know the method descriptor and the address of the code, so I can dump the IL (using the method descriptor) and the actual assembly (using the code address)

0:004> !dumpil 000007fa66f83878   
ilAddr = 00000000009e20a0
IL_0000: nop
IL_0001: ldsfld SpotTheDefect1.Program::_disposed
IL_0006: stloc.0
IL_0007: ldloc.0
IL_0008: brtrue.s IL_0017
IL_000a: nop
IL_000b: ldsfld SpotTheDefect1.Program::_foo
IL_0010: callvirt Foo::Bar
IL_0015: nop
IL_0016: nop
IL_0017: ret

0:004> !U 000007fa670a0270
Normal JIT generated code
SpotTheDefect1.Program.Thread1()
Begin 000007fa670a0270, size 61

c:\Users\Daniel\Documents\Visual Studio 11\Projects\SpotTheDefect1\SpotTheDefect1\Program.cs @ 24:
>>> 000007fa`670a0270 4883ec38        sub     rsp,38h
000007fa`670a0274 c644242000      mov     byte ptr [rsp+20h],0
000007fa`670a0279 48b83034f866fa070000 mov rax,7FA66F83430h
000007fa`670a0283 8b00            mov     eax,dword ptr [rax]
000007fa`670a0285 85c0            test    eax,eax
000007fa`670a0287 7405            je      SpotTheDefect1!SpotTheDefect1.Program.Thread1()+0x1e (000007fa`670a028e)
000007fa`670a0289 e8c6cda45f      call    clr!TranslateSecurityAttributes+0x62a9c (000007fa`c6aed054) (JitHelp: CORINFO_HELP_DBG_IS_JUST_MY_CODE)
000007fa`670a028e 90              nop

c:\Users\Daniel\Documents\Visual Studio 11\Projects\SpotTheDefect1\SpotTheDefect1\Program.cs @ 25:
000007fa`670a028f 8a059e33eeff    mov     al,byte ptr [000007fa`66f83633]
000007fa`670a0295 88442420        mov     byte ptr [rsp+20h],al

c:\Users\Daniel\Documents\Visual Studio 11\Projects\SpotTheDefect1\SpotTheDefect1\Program.cs @ 26:
000007fa`670a0299 0fb6442420      movzx   eax,byte ptr [rsp+20h]
000007fa`670a029e 85c0            test    eax,eax
000007fa`670a02a0 7527            jne     SpotTheDefect1!SpotTheDefect1.Program.Thread1()+0x59 (000007fa`670a02c9)
000007fa`670a02a2 90              nop

c:\Users\Daniel\Documents\Visual Studio 11\Projects\SpotTheDefect1\SpotTheDefect1\Program.cs @ 27:
000007fa`670a02a3 48b83856e21200000000 mov rax,12E25638h
000007fa`670a02ad 488b00          mov     rax,qword ptr [rax]
000007fa`670a02b0 4889442428      mov     qword ptr [rsp+28h],rax
000007fa`670a02b5 488b442428      mov     rax,qword ptr [rsp+28h]
000007fa`670a02ba 803800          cmp     byte ptr [rax],0
000007fa`670a02bd 488b4c2428      mov     rcx,qword ptr [rsp+28h]
000007fa`670a02c2 e8b1bdeeff      call    SpotTheDefect1.Program+Foo.Bar() (000007fa`66f8c078) (SpotTheDefect1.Program+Foo.Bar(), mdToken: 0000000006000006)
000007fa`670a02c7 90              nop

c:\Users\Daniel\Documents\Visual Studio 11\Projects\SpotTheDefect1\SpotTheDefect1\Program.cs @ 28:
000007fa`670a02c8 90              nop

c:\Users\Daniel\Documents\Visual Studio 11\Projects\SpotTheDefect1\SpotTheDefect1\Program.cs @ 29:
000007fa`670a02c9 eb00            jmp     SpotTheDefect1!SpotTheDefect1.Program.Thread1()+0x5b (000007fa`670a02cb)
000007fa`670a02cb 90              nop
000007fa`670a02cc 4883c438        add     rsp,38h
000007fa`670a02d0 c3              ret

A couple of things to note: Firstly, the IL tends to match up with the actual code quite accurately (unless you are using compiler magic like yield return or await/async), and I’m using !U to dump the disassembly which (if you have the symbols) will show you what line of code generated which assembly commands (although the JIT may rearrange sections of code (especially for try/catch/finally statements) – so don’t expect the assembly to always work out as nicely as it did above). You may have also noticed that the assembly is quite verbose, contains debugging information (“CORINFO_HELP_DBG_IS_JUST_MY_CODE”) and some unnecessary NOPs (e.g. line 28 of my code was turned into a single NOP) – this is because I was using a debug build of SpotTheDefect1.exe, if I was using the release build, then a lot of these instructions would be optimized away (or not generated in the first place).

Finally, if we were interested in the assembly for Thread2, we could try to dump of the method descriptor

0:004> !dumpmd 000007fa66f83888  
Method Name:  SpotTheDefect1.Program.Thread2()
Class:        000007fa67092240
MethodTable:  000007fa66f838a8
mdToken:      0000000006000003
Module:       000007fa66f82f80
IsJitted:     no
CodeAddr:     ffffffffffffffff
Transparency: Critical

But the code address points to nowhere – which is because it hasn’t been compiled yet, so there is no assembly associated with the method (although you could still dump out the IL).

Tagged , , , , ,

SpotTheDefect[0].Answer[1]

A couple of weeks ago I asked you to play a little game of “Spot the Defect”, and recently I posted the first of three posts with the answers, which focused on the output of the program, what the bug was and a possible way to fix it. This first fix focused on using locks to serialize access and, while locks in .Net are very fast, they are not free and prevent your application from taking full advantage of multi-core hardware. So now we are going to explore a lock free, safe alternative – but first we need to take a step back and remember how objects work in .Net (and most other Object Orientated runtimes)

Can you give me some pointers?

Usually when we code in an object orientated language and we “new up” an object we have a mental model that says that our variable contains the actual object. Similarly, if we then set that object to null, then we believe that we are removing the object from existence – but this is not the case. The variable we assigned the object to does not contain the actual object, but rather it holds a pointer to the actual object in the heap, and so setting that variable to null merely means that we are clearing the pointer, but the object will still reside in the heap until the Garbage Collector (GC) comes along to remove it. Additionally, all we need to do to ward off the GC from destroying an object is to maintain a reference (i.e. a pointer) to it.

Copy, check, continue

(There is probably already a name for this “pattern”, but I couldn’t really find it – so this will have to do. If you have a better name, or find the real name, let me know.)

So, to avoid taking locks, there are a few things that we need to do: firstly, we need to create out own reference to the object that _foo is pointing to such that we don’t have to rely on using _foo (which could become null at any time) and to prevent the GC from eating the object. The easiest way to do this is to make a copy of _foo (remember, that is copying the pointer, not the object). Which leads to my second point: once we’ve made the copy of _foo, we need to check that our copy isn’t null (because _foo has already been set to null). Finally, if our copy isn’t null, then we can continue to do whatever we planned to do.

A couple of notes about the code (you may want to come back to these after reading the code)

  • I’m using the var keyword (because I don’t really care what type _foo is, it makes maintenance easier and I’m lazy…)
  • We didn’t have to modify Thread2 at all
  • The performance overhead to Thread1 is minimal (we allocate a pointer on the stack, copy a pointer to it and then check if it is 0 – in reality the compiler will probably optimize fooLocal away and just use a register, further reducing the overhead)
  • If you were really concerned about performance, you could now drop the _disposed variable as well (since it’s not really need it)

And, finally, our updated code:

using System;
using System.Threading;

namespace SpotTheDefect1
{
    class Program
    {
        private static int _x = 0;
        private static Foo _foo = new Foo();
        private static bool _disposed = false;

        static void Main(string[] args)
        {
            Thread thread1 = new Thread(Thread1);
            Thread thread2 = new Thread(Thread2);

            thread1.Start();
            thread2.Start();

            thread1.Join();
            thread2.Join();
        }

        private static void Thread1()
        {
            if (!_disposed)
            {
                var fooLocal = _foo;
                if (fooLocal != null)
                {
                    fooLocal.Bar();
                }
            }
        }

        private static void Thread2()
        {
            Console.WriteLine("Disposing");
            _disposed = true;
            _foo = null;
        }

        private class Foo
        {
            public void Bar()
            {
                Console.WriteLine("Hello, World!");
            }
        }
    }
}
Tagged , , ,

SpotTheDefect[0].Answer[0]

Last week(ish) I posted a quick “Spot the Defect” game, where I asked you to:

  • Figure out what it would output
  • Find the bug(s)
  • Correct the bugs

So today is one of three posts with the answers – I’ll focus on the output, the bug and a trivial correction, with the next two posts diving deeper into other possible solutions.

Explosions, and not the good kind

So, most of you probably ran the code and got the output:

Hello, World!
Disposing

If you were lucky, or had a few things running in the background, you may also have seen:

Disposing

But, it was also possible to get:

Disposing
Hello, World!

Or, even a NullReferenceException!

How is this possible? Because of a very subtle race condition…

Scheduling Conflict

Most of the time, the application would have kicked off both threads, and these threads would run in their entirety without ever being preempted. Additionally, since we started Thread1 before Thread2, in all likelihood Windows would have scheduled and ran the threads in that order. However, it is entirely possible that Windows could decide to preempt either thread at any stage in our code, or execute them in any order. So, in order to see “Disposing” before “Hello, World!” we would need Thread2 to go first and then be preempted just after it wrote to the console to allow Thread1 to run in its entirety. The NullReferenceException is the opposite: If Thread1 is preempted just after it checks _disposed and enters its if-statement, then this allows Thread2 to “race in” and set _foo to null, causing Thread1 to hit a NullReferenceException when it resumes.

This kind of bug is especially difficult to diagnose because of how “tight” the timing is. You would see the exception intermittently at best (you can try this by wrapping it in a for-loop), and it would occur even less frequently if there was a debugger attached or if you added more diagnostics\tracing (simply because there becomes more “safe” lines of code to be preempted at, so the timing window becomes even tighter). And even if, somehow, you did manage to get a crash dump with the exception, you would need to dig through all of your code to try to find how you ended up crashing (which, if you’re not thinking about how threads can affect each other, is about as effective as bashing your head against a wall).

When in doubt, lock

At the beginning of this post, I described this as the “trivial” solution, but that doesn’t mean that you should discredit it – although locks seem like the most heavy handed approach, they are usually the safest as well. For those of you unfamiliar with C#’s lock keyword (or the Monitor Enter\Exit that it wraps), this is a language construct that acts similar to a critical section or a mutex, except that you do not create a synchronization primitive to manage the state of the lock, but rather you lock on an object (and the Monitor maintains the lock state). Take a minute to think about that: you must lock on an object – not a primitive, not a struct, and not null. This complicates matters for our code, since the only object we have is _foo, and we can’t guarantee that it isn’t null. We also can’t lock on the ‘this’ object because we are in a static method (not that you should ever lock on ‘this’ or any other object that is publically visible, as this can lead to unexpected deadlocks if other code decides to lock on your object or its properties\return values). The common solution to this problem is to add a dedicated ‘lock object’ that is only ever used for locking and is guaranteed to not be null.

Many developers shy away from locks as they are viewed as performance issues and the leading cause of hangs. While a lock is not free, you’d be surprised just how fast a lock is, especially when there is no contention on it. Deadlocks, however, always remain a problem – but, I can say from experience: it is much easier to reason about the ordering that locks are taken and released, than to consider and find the type of bug I’ve described above.

So, this “trivial” solution is to add a new “lock object”, and then to serialize access to _foo:

using System;
using System.Threading;

namespace SpotTheDefect1
{
    class Program
    {
        private static int _x = 0;
        private static Foo _foo = new Foo();
        private static bool _disposed = false;
        private static object _fooLockObject = new object();

        static void Main(string[] args)
        {
            Thread thread1 = new Thread(Thread1);
            Thread thread2 = new Thread(Thread2);

            thread1.Start();
            thread2.Start();

            thread1.Join();
            thread2.Join();
        }

        private static void Thread1()
        {
            if (!_disposed)
            {
                lock (_fooLockObject)
                {
                    if (_foo != null)
                    {
                        _foo.Bar();
                    }
                }
            }
        }

        private static void Thread2()
        {
            Console.WriteLine("Disposing");
            _disposed = true;
            lock (_fooLockObject)
            {
                _foo = null;
            }
        }

        private class Foo
        {
            public void Bar()
            {
                Console.WriteLine("Hello, World!");
            }
        }
    }
}
Tagged , , , ,

SpotTheDefect[0]

Let’s play a little game of “Spot the Defect”. For the following code, see if you can:

  • Figure out what it would output
  • Find the bug(s)
  • Correct the bugs

(And I’ll do a post next week with the answer)

using System;
using System.Threading;

namespace SpotTheDefect1
{
    class Program
    {
        private static int _x = 0;
        private static Foo _foo = new Foo();
        private static bool _disposed = false;

        static void Main(string[] args)
        {
            Thread thread1 = new Thread(Thread1);
            Thread thread2 = new Thread(Thread2);

            thread1.Start();
            thread2.Start();

            thread1.Join();
            thread2.Join();
        }

        private static void Thread1()
        {
            if (!_disposed)
            {
                _foo.Bar();
            }
        }

        private static void Thread2()
        {
            Console.WriteLine("Disposing");
            _disposed = true;
            _foo = null;
        }

        private class Foo
        {
            public void Bar()
            {
                Console.WriteLine("Hello, World!");
            }
        }
    }
}
Tagged , , ,

Collectively Concurrent

“The concept of a stack in programming is very similar to a stack of plates in real life, except that you can cheat a little – for instance, if you’re willing to accept that plates can float, then you can ignore gravity.”

One of the fantastic things about using a rich framework like .Net is that many of the basic data structures that programmers require already exists in a efficient and easy to use form. We also make sure to update these pre-built data structures with the new features that we introduce in the language, for instance in .Net 2.0 we introduced generics, and so the System.Collections.Generic namespace was created. With.Net 4.5 we are introducing new async APIs and the async\await keyword pair, meaning that programmers will now need to deal with concurrency and multithreading more often especially if their application has any shared data structures. Luckily enough, we already have the appropriate APIs that were introduced in 4.0: the System.Collections.Concurrent namespace.

Stack it. Queue it. Bag it.

The first couple of APIs I’d like to introduce you to is the ConcurrentStack and ConcurrentQueue. These are exactly as they sound: A stack and a queue that permit concurrent operations. One common trap when writing a multithreaded application is the pattern of checking the Count of the collection to see if an item is available, before attempting to take an item – which is fine with single threading, but in a multithreaded environment it is possible to have another thread jump in between your check and getting the item which then takes the last item before you can. Instead, the concurrent collections have the the TryPop and TryDequeue methods, which will atomically check the size of the structure and return an item to you if there is one available.

The other data structure I hinted at is the ConcurrentBag. Unlike the Stack or Queue, the ConcurrentBag has no guarantees about the order of output versus input – it’s an “Any In, Any Out” collection. This allows ConcurrentBag can have a much more efficient implementation when there is contention, since the collection can return any object that it currently holds, rather than having to coordinate with any of the threads accessing it in order to return objects in the correct order. One of the best uses of a ConcurrentBag is for a non-time sensitive resource cache, like a buffer pool – where you want to have the best performance even with contention, but it is ok to not return the most recently used object (as you would want to do with a connection pool).

Performance

One of the issues with writing multithreaded applications is attempting to measure performance, especially when you have a shared resource. If you are running a single threaded test with the above data structures, then you may notice that simply putting a standard Stack or Queue inside of a lock gives better performance than the Concurrent equivalent. However, introduce some contention (i.e. have multiple threads attempting to access the same object), and the Concurrent structures begin to shine. Additionally, you need to be careful when doing multithreaded micro-benchmarks as you may introduce too much contention (since a “real” application is likely to do some work with the object is just obtained, rather than handing it back to the collection).that would then skew your results.

However, unless you have a high-performance single threaded application or a multithreaded application with no shared resources, then a concurrent collection will be your best bet. It may be slower in an application with little load, but it will be much easier to scale it to a larger application if needed.

Tagged , , , , , , ,