Over the weekend, one of my ex-colleagues reached out to me for my input on running SQL in a 2 node Microsoft Cluster. Of course, these servers were supposed to be VMs and all that. Now my initial reaction was, yeah sure you can do all this but there are restrictions that are placed on VMs that are part of MSCS. This included, affinity rules, exceptions in DRS, limited to two nodes only and all the other stuff that goes along with it.
Below are some of the limitations or stuff that you CANNOT do on VMs that are part of the MSCS. This is straight from the install guide:
- Mixed environments, such as configurations where one cluster node is running a different version of ESXi than another cluster node.
- Use of MSCS in conjunction with vSphere Fault Tolerance (FT).
- Migration with vSphere vMotion of clustered virtual machines.
- N-Port ID Virtualization (NPIV)
As I was going over the caveats, I was reminded that this will be running SQL 2012 using AlwaysON Availability Groups, and the VMs need to be in a cluster however there will bo no need for shared storage, even the quorum can be windows share somewhere on a fileserver etc. When I heard that, I was confused and asked for some time to do research. I read a few articles online, talked to folks on twitter, posted input on the secret vExpert community (which btw was very helpful and prompt) and came back with a totally different mindset to approach this.
Based on the feedback I got from everyone, from the articles I read and of course the AlwaysON intro my ex-colleague gave me I came to the following conclusion. VMs running SQL 2012 AAG in a Microsoft Cluster with no shared disk should be treated like any other VM. Which means, you can have more than 2 nodes and the limitations of a typical MSCS VM does not apply. Why so?
To start off, what in the world is AAG? You can get some good detailed info on that here. But to put it simply, AAG is the new and improved HA and DR solution for SQL, kinda like database mirroring (the orignal database mirroring still exists but its probably on it’s way out).
It’s like mirroring, but we get multiple mirrors for many more databases that we can fail over in groups, and we can shed load by querying the mirrors.
It relies on Windows Failover Clustering and the file synchronization happens at the application layer. This means that all those contraints that are usually placed on MSCS VMs due to RDMs, dont exist here. This also means, you can vMotion the box and do everything that you would to most other VMs. To put it simply, it’s not a special VM anymore.
How is AAG better than database mirroring? Below is a list of some of the improvements extracted from this article:
- Supports one primary database replica and allows up to 4 secondary database replicas targets.
- Asynchronous-commit mode. This availability mode is a DR solution that works great when the availability replica copies are distributed with not so stable connection.
- Synchronous-commit mode. This availability mode put emphasis on high availability and allows data protection over performance, con is transaction latency.
- Allows automatic page restoration against page corruption.
- Backupable and Read-only access to the secondary databases
- Fast application failover is provided by availability group listeners.
- The greater failover control is achieved by Flexible failover policy.
Of course there are a few things to consider when you run this all virtually (thanks to the VMTN feedback I received), even though you may be able to vMotion these boxes, keep in mind the Windows Failover Cluster (WFC) heartbeat is very sensitive and and the small stun time maybe enough for your cluster to assume a node has failed. So adjusting your heartbeat timeout maybe something to consider. Matt shows here how to do that here, though he is doing that on a Database Availability Group (DAG), it still relies on WFC like AAG.
Now how do you set this up? I was thinking about doing a step by step but I found someone else who already beat me to that. So here are the steps that cover setting up WFC to enabling AAG on SQL. Denny Cherry also plans to have a session around this topic at VMworld.
In the end, I think SQL 2012 with AAG will certainly help to better the relationship of SQL and virtualization. With the restrictions relaxed on this type of a setup, you can now have bigger WFC clusters within a HA/DRS cluster. With HA/DRS you get the protection from hardware related incidents and with AAG, your application becomes intelligent. In the end you look good and find more time to do more important things in life.
PS : Lastly, we will still be doing a proof of concept to see how well this all holds up. I will encourage you to do your independent testing before introducing this in production. In paper this sounds perfect. I plan to keep this post updated with what we find / learn or at least a link to the updated post depending on how this goes. Good luck!