vSphere + SQL 2012 + AlwaysON

Over the weekend, one of my ex-colleagues reached out to me for my input on running SQL in a 2 node Microsoft Cluster. Of course, these servers were supposed to be VMs and all that. Now my initial reaction was, yeah sure you can do all this but there are restrictions that are placed on VMs that are part of MSCS. This included, affinity rules, exceptions in DRS, limited to two nodes only and all the other stuff that goes along with it.

Below are some of the limitations or stuff that you CANNOT do on VMs that are part of the MSCS. This is straight from the install guide:

  • Mixed environments, such as configurations where one cluster node is running a different version of ESXi than another cluster node.
  • Use of MSCS in conjunction with vSphere Fault Tolerance (FT).
  • Migration with vSphere vMotion of clustered virtual machines.
  • N-Port ID Virtualization (NPIV)

As I was going over the caveats, I was reminded that this will be running SQL 2012 using AlwaysON Availability Groups, and the VMs need to be in a cluster however there will bo no need for shared storage, even the quorum can be windows share somewhere on a fileserver etc. When I heard that, I was confused and asked for some time to do research. I read a few articles online, talked to folks on twitter, posted input on the secret vExpert community (which btw was very helpful and prompt) and came back with a totally different mindset to approach this.

Based on the feedback I got from everyone, from the articles I read and of course the AlwaysON intro my ex-colleague gave me I came to the following conclusion. VMs running SQL 2012 AAG in a Microsoft Cluster with no shared disk should be treated like any other VM. Which means, you can have more than 2 nodes and the limitations of  a typical MSCS VM does not apply. Why so?

To start off, what in the world is AAG? You can get some good detailed info on that here. But to put it simply, AAG is the new and improved HA and DR solution for SQL, kinda like database mirroring (the orignal database mirroring still exists but its probably on it’s way out).

It’s like mirroring, but we get multiple mirrors for many more databases that we can fail over in groups, and we can shed load by querying the mirrors.

It relies on Windows Failover Clustering and the file synchronization happens at the application layer. This means that all those contraints that are usually placed on MSCS VMs due to RDMs, dont exist here. This also means, you can vMotion the box and do everything that you would to most other VMs. To put it simply, it’s not a special VM anymore.

How is AAG better than database mirroring? Below is a list of some of the improvements extracted from this article:

  • Supports one primary database replica and allows up to 4 secondary database replicas targets.
  • Asynchronous-commit mode. This availability mode is a DR solution that works great when the availability replica copies are distributed with not so stable connection.
  • Synchronous-commit mode. This availability mode put emphasis on high availability and allows data protection over performance, con is transaction latency.
  • Allows automatic page restoration against page corruption.
  • Backupable and Read-only access to the secondary databases
  • Fast application failover is provided by availability group listeners.
  • The greater failover control is achieved by Flexible failover policy.

Of course there are a few things to consider when you run this all virtually (thanks to the VMTN feedback I received), even though you may be able to vMotion these boxes, keep in mind the Windows Failover Cluster (WFC) heartbeat is very sensitive and and the small stun time maybe enough for your cluster to assume a node has failed. So adjusting your heartbeat timeout maybe something to consider. Matt shows here how to do that here, though he is doing that on a Database Availability Group (DAG), it still relies on WFC like AAG.

Now how do you set this up? I was thinking about doing a step by step but I found someone else who already beat me to that. So here are the steps that cover setting up WFC to enabling AAG on SQL. Denny Cherry also plans to have a session around this topic at VMworld.

In the end, I think SQL 2012 with AAG will certainly help to better the relationship of SQL and virtualization. With the restrictions relaxed on this type of a setup, you can now have bigger WFC clusters within a HA/DRS cluster. With HA/DRS you get the protection from hardware related incidents and with AAG,  your application becomes intelligent. In the end you look good and find more time to do more important things in life. 🙂

PS : Lastly, we will still be doing a proof of concept to see how well this all holds up. I will encourage you to do your independent testing before introducing this in production. In paper this sounds perfect. I plan to keep this post updated with what we find / learn or at least a link to the updated post depending on how this goes. Good luck!

HA for MSCS VMs in vSphere

A few days ago, I was complaining about not knowing why HA has to be disabled on a MSCS setup in vSphere. Turns out, only DRS needs to be disabled as HA is still supported according to KB article 1037959. If I read it correctly, even in a cluster across box(CAB) type of setup where you will have to use physical compatibility mode, HA is still supported. DRS is not supported in all vSphere and MSCS setup due to the reasons I discussed in one of the previous blogs. Although the MSCS user guide for 4.1 suggests that you can setup DRS to partially automated for MSCS machines, the pdf also mentions that the migration of these VMs is not recommended. And as the table below suggests, DRS is not supported either.

kb article 1037959

So, what does support for HA really mean? If you only have a two node cluster and have a MSCS CAB setup, the HA support will not effect you because of the anti-affinity rules. However, if your ESX/i cluster is bigger than two nodes, then HA can be leveraged and the dead MSCS VM an be restarted on a different host and still be in compliance with the anti-affinity rule that has been set. For MSCS CIB setup, HA can be leveraged on even a two node ESX/i cluster. When host one dies, host two finds itself spinning up the two partners in crime. One thing to note here is, all of this is only possible if the storage (both the boot vmdk and the RDM/shared disk) is presented to all the hosts in the cluster. I can’t imagine why anyone would not do that to begin with.

Again only a two node MSCS cluster is supported so far. With HA being supported for MSCS VMs, I guess one can certainly benefit from added redundancy. If you think this is being two redundant, just don’t use the feature and disable HA for the MSCS VMs in your environment. I would highly recommend to disable HA for the the two VMs if they are part of a MSCS CAB setup in a two node ESX/i cluster.

MSCS VMs and Snapshots

When using VMs in a MSCS cluster across box (CAB), you will need to setup the RDMs in a physical compatibility mode and enable bus sharing. Please note that VMs with RDMs in physical mode will not allow you to snapshot either. Basically, you will find that your MSCS VMs will have their snapshot option greyed out. What does this mean?

You can’t use VCB to backup your VMs as that relies on snapshots.
You can’t use vDR for backups as that relies on snapshots.
And lastly you can’t leverage the snapshotting ability for tasks like patching of your VM if you have been practicing that in the past.

When running the disk in independent/persistent mode, you would think that snapshots would still work and only snapshot the vmdk running the OS partition and not your RDM. However, with the bus sharing in place for MSCS to work, the snapshotting criterion is not met. Hence the option remains greyed out. Also, lets assume you are able to snapshot the VM somehow, and you take the snapshot of VM1, after keeping it for a day you suddenly decide to revert it back, I am not sure on how VM2 or even the cluster itself will behave when all of a sudden one of its nodes has forgotten what happened 24 hours ago. So, just don’t snapshot it even if you come up with a way of being able to do it.

One thing I haven’t tried yet is to see what happens when I turn a node off, snapshot it, do what I need to do and when I need to revert, turn it off and then revert. Not sure if this is even possible or supported. Not to mention, the bus sharing will be have to be disabled. But in all seriousness, I would never do this in a production environment. If I ever decide to test this, this will only be for my own curiosity.

Not being able to snapshot should not really be the reason for you to scrap MSCS in VMware. This is simply a limitation to understand prior to designing a solution. If snapshotting is of paramount importance, then this may not be your cup of tea, however will the alternate solution give you that ability? If not being able to back up the VM using VCB or vDR is your issue, then please be informed that you can still backup the VM by installing the agent inside the guest and use a traditional backup mechanism. Leverage the same infrastructure that has been backing up your physical world. MSCS in VMware is not perfect, but it will get there.

MSCS and vSphere Conflicts

As already addressed in the vSphere 4 u1 release notes, MSCS VMs are supported in a HA/DRS cluster, its amazing how many few have noticed the change. With all the functionalities that have been introduced over the years by VMware, its easy to miss a few things every now an then. Some consider MSCS a primitive form of clustering as opposed to HA/DRS clusters within ESX/i. However it must be noted that a HA/DRS cluster does not protect you from application failure or OS corruption. Neither does FT in vSphere. With a FT enabled VMs, it must be noted that when the primary VM blue screens, so does the secondary VM and you are left with two identical server both not functioning.

To sum it up, HA/DRS and even FT protects you from a hardware failure only. According to VMware, MSCS must be leveraged to maintain a 100% uptime for Windows guests. So what you can and cannot do with MSCS and VMware?

You can cluster two VMs on the same host, two VMs on seperates hosts and you can also cluster a physical and virtual machine. There are detailed guides published by VMware on how this can be achieved. (Click Here)

A 50K foot view of what you can and cannot do and this will also differ based on the version of ESX/I you are running:
Only two nodes in a MSCS cluster
MSCS cannot be an FT enabled VM
Though MSCS VMs can be in a HA/DRS cluster, both HA and DRS should be disabled for all the VMs that are a part of MSCS
Quorum and shared disk should not have the VMFS signature and should be presented to all the hosts in the cluster where the MSCS VMs reside (Think about it, it makes sense)
Don’t overcommit and try to create a reservation for your VM equal to the size of the memory assigned.
The VMware doc will have more details

Now the last part, DRS is disabled because under the hood, HA uses vMotion. Though vMotion is rapid and causes no outage for the users, MSCS heartbeat is very sensitive and may detect the few seconds of the stunning period as a node failure and consider that node to be down. This is certainly not what you want. Hence its best not to vMotion, which is why DRS is disabled as well.
Why is HA disabled? No one has been able to give a straight answer on that and it basically comes down to that its not supported.

As of now I really don’t know why you can’t have HA enabled for a VM that is part of a MSCS cluster.
The good news is, with 4 u1 and onwards, you can utilize the same hosts that are in a HA/DRS cluster to run your MSCS VMs, just don’t forget to disable these features for the VMs that are part of the MSCS cluster or else the VMware and MS support may stiff you in time of need.


RDM Tutorial

At times as VMware engineers, you may run into situations that sort of take away the flexibilities that you have with your environment. For example, a request to attach a RAW LUN to a VM, will prevent you from taking snapshots depending on how the disk is configured in VMware. Though RAW LUNS may seem like a hurdle for some, it has its place in the virtual world. One of the advantages of RDM is that it enables your storage team to run their fancy management tools on the presented LUN. At the same time, you will see better performance in a high I/O VM with the intensive I/O being executed on the RDM versus a virtual disk. A database server with huge transactional read and writes may seem like a good candidate for RDMs. MSCS requirements will also lead one to look at RDMs.

I found a good tutorial on RDM at showmedo.com that I am sharing with you guys.