First off – sorry for not posting in 6 months… I was somewhat preoccupied with finishing up my honors and starting my new job at Microsoft. As you may have noticed by my spelling, I’m now in the US and have settled in quite well. But enough about me, on to the blog post!
So, last Tuesday was Patch Tuesday and, lo and behold, my laptop couldn’t connect to Exchange, and my phone hadn’t synced since 3am that morning. Since I was foolish enough to have my server set to automatically install updates (which, by default, happens at 3am) I immediately knew what was wrong. Logging into the server confirmed my suspicions – the server had installed updates and now the Exchange Information Store service was stuck in the ‘starting’ state. I hit reset on the server, blocked my ears as the fans revved to maximum speed while the machine booted and waited. And waited. And waited. Something had gone horribly wrong – the hard drive light had stopped, but I couldn’t remote in; I tried plugging in my monitor – but it simply said “Mode not compatible: 74.9Hz”, most likely because the server hadn’t booted with the monitor connected. I hit the reset button and waited again, this time watching the boot sequence. Two things caused alarm bells to ring in my head – firstly the RAID status was “Verify” not “Optimal” and, secondly, Windows start up never got past “Applying computer settings” (with the “Donut of Doom” spinning ominously to the left of it).
My first reaction was to restore from the last backup – I had nightly backups going and Server 2K8 R2 makes restoring from backups (even onto ‘bare metal’) extremely easy. I rolled back to the last backup but, to my horror, Windows was still stuck at “Applying computer settings”. I tried backups that were even older, but to no avail – the server wouldn’t boot. By this time the fact that the RAID status was still “Verify” had gone from a curiosity in my mind to a possible cause – so I broke the array and attempted to restore onto a single disc, with that failing I recreated the array and tried again. Still nothing. At this point I was pretty desperate, I couldn’t remote in or physically log in, so I pulled out the debugging tools.
First up was the old faithful, the most useful tool that ships with Windows and is severely underrated and probably under used – Event Viewer. Even though the server hadn’t booted properly, and my laptop wasn’t on the same domain as the server, Event Viewer still managed to connect to the server. What I saw in the logs (past all of the spam that P3SS generates – I really need to fix that…) is that the MSExchange ADAccess was having 2102 and 2114 events, indicating that it couldn’t find the AD server – which was especially weird since the AD server WAS the Exchange server… But things began to make more sense now.
The “Applying computer settings” part of Windows start up is when a large majority of services running in Windows start. If these services hang then the machine gets ‘stuck’ in this screen. Except that these services should never hang because the Service Monitor is supposed to kill them after 30 seconds. But how do you kill a service that is in the process of “stopping” and refuses to respond? Exchange not finding the AD service meant that either AD or DNS was not running properly. Since I had to provide domain credentials to Event Viewer in order to log into to the server, AD must have been ok – therefore the DNS server had borked. And the easiest way to unbork a DNS server? Yank the network cable.
I pulled out the network cable from the server, and nothing happened. Damn. Perhaps the DNS services was still trying to bind to the static IP of the disconnected network card? So I plugged the cable into the secondary network card (which had a dynamic IP) and… the server finally booted! Logging in I found my theory confirmed again – the DNS service and a number of Exchange services that were set to “Automatic” had not started. I started the services, and everything was running as smooth as butter – so I reset the server. And it got stuck again. A quick switch of network ports and kicking off some services, and we were back in business.
So there you have it – when in doubt, try the other network port!
Addendum: Binging “DSC_E_NO_SUITABLE_CDC” has resulted in a few things to try, including enabling IPv6 (which I don’t really want to do, as it tends to break Outlook Anywhere) and adding the Exchange Server to the “Domain Admins” group… I’ll try some of these over the weekend and let you all know how it goes!
Update: It looks like Travis Wright on the TechNet forums had the correct answer – if you had IPv6 enabled when you installed Exchange 2010, it needs to remain enabled. (In hindsight, this makes sense, as it may have been possible that the DNS service was trying to bind to a non-existent IPv6 address and that the Exchange AD Topology service may have been looking for AD on the IPv6 loopback address, and connecting the cable to the secondary card worked because that secondary card still had IPv6 enabled)