VMware vSphere 5.0 Clustering Technical Deepdive

With vSphere being unleashed by VMware a few hours ago, along with my excitement I also sense a level of responsibility to make sure I keep up with VMware’s pace. With that being said I am excited about the newly designed HA. I will be posting in the coming days more about what has changed and how you can sleep better at nights. In the mean time below are the three things that are in my list to do and I will recommend that to anyone who is interested in the cloud space:

I can’t stress enough on the importance of ordering VMware vSphere 5 Clustering Technical Deepdive, it’s available in all different kinds of flavors and pricing to fit your need. The book covers HA, DRS, SDRS and much more. I don’t think the authors of the book need any introduction and the wealth of knowledge they have to offer is priceless. So click, buy, start reading and keep virtualizing.

Memory state in ESXTOP/RESXTOP

Often times, you will question if you have enough room for another VM on your host. Now before I begin, let me clarify in a larger environment, you should certainly use capacity analysis tools. But what if you are a small shop and can’t afford one of those tools and you are only an owner of a small cluster and dont mind running ESXTOP/RESXTOP to figure this out. You can look at TPS and other areas but the memory state of the host will indicate the kind of the stress this host is under. This will be your best friend.


As you can tell my host is in the ‘High” state. What does this really mean? Your host can be in one of the following states: “high”, “soft”, “hard” or “low”. Your host will be in either one of these states based on the following:

high state = if the free memory is greater than or equal to 6%

soft state = if the free memory is at 4%

hard state = if the free memory is at 2%

low state = if the free memory is at 1%

As you can tell, high state is what will keep your host happy. One thing to note is in the high and soft states, ballooning is favored over swapping, in hard and low states, swapping is favored over ballooning. Of course TPS and other techniques will enable you to efficiently use the memory on your host and allow you to overcommit.  Another thing to point out is that your host maybe in ‘high’ state but you may notice your VM is still swapping. It’s not  the host, its really the limit on your VM or your RP settings that is causing this VM to swap.

The good news is that DRS will move your VM over to another host (based on your setting) if its gets under stress and moving a VM will guarantee to better its performance. But I have always found ESXTOP/RESXTOP to be an excellent tool to get an insight on whats really happening on your host. Remember a holistic view is great, and when we talk about a cloud a single host may not mean much. However, each host is a building block that forms your cloud. Understanding how memory is handled on a host level will give you better insight on the holistic stats of memory in your cloud.

HA for MSCS VMs in vSphere

A few days ago, I was complaining about not knowing why HA has to be disabled on a MSCS setup in vSphere. Turns out, only DRS needs to be disabled as HA is still supported according to KB article 1037959. If I read it correctly, even in a cluster across box(CAB) type of setup where you will have to use physical compatibility mode, HA is still supported. DRS is not supported in all vSphere and MSCS setup due to the reasons I discussed in one of the previous blogs. Although the MSCS user guide for 4.1 suggests that you can setup DRS to partially automated for MSCS machines, the pdf also mentions that the migration of these VMs is not recommended. And as the table below suggests, DRS is not supported either.

kb article 1037959

So, what does support for HA really mean? If you only have a two node cluster and have a MSCS CAB setup, the HA support will not effect you because of the anti-affinity rules. However, if your ESX/i cluster is bigger than two nodes, then HA can be leveraged and the dead MSCS VM an be restarted on a different host and still be in compliance with the anti-affinity rule that has been set. For MSCS CIB setup, HA can be leveraged on even a two node ESX/i cluster. When host one dies, host two finds itself spinning up the two partners in crime. One thing to note here is, all of this is only possible if the storage (both the boot vmdk and the RDM/shared disk) is presented to all the hosts in the cluster. I can’t imagine why anyone would not do that to begin with.

Again only a two node MSCS cluster is supported so far. With HA being supported for MSCS VMs, I guess one can certainly benefit from added redundancy. If you think this is being two redundant, just don’t use the feature and disable HA for the MSCS VMs in your environment. I would highly recommend to disable HA for the the two VMs if they are part of a MSCS CAB setup in a two node ESX/i cluster.

MSCS and vSphere Conflicts

As already addressed in the vSphere 4 u1 release notes, MSCS VMs are supported in a HA/DRS cluster, its amazing how many few have noticed the change. With all the functionalities that have been introduced over the years by VMware, its easy to miss a few things every now an then. Some consider MSCS a primitive form of clustering as opposed to HA/DRS clusters within ESX/i. However it must be noted that a HA/DRS cluster does not protect you from application failure or OS corruption. Neither does FT in vSphere. With a FT enabled VMs, it must be noted that when the primary VM blue screens, so does the secondary VM and you are left with two identical server both not functioning.

To sum it up, HA/DRS and even FT protects you from a hardware failure only. According to VMware, MSCS must be leveraged to maintain a 100% uptime for Windows guests. So what you can and cannot do with MSCS and VMware?

You can cluster two VMs on the same host, two VMs on seperates hosts and you can also cluster a physical and virtual machine. There are detailed guides published by VMware on how this can be achieved. (Click Here)

A 50K foot view of what you can and cannot do and this will also differ based on the version of ESX/I you are running:
Only two nodes in a MSCS cluster
MSCS cannot be an FT enabled VM
Though MSCS VMs can be in a HA/DRS cluster, both HA and DRS should be disabled for all the VMs that are a part of MSCS
Quorum and shared disk should not have the VMFS signature and should be presented to all the hosts in the cluster where the MSCS VMs reside (Think about it, it makes sense)
Don’t overcommit and try to create a reservation for your VM equal to the size of the memory assigned.
The VMware doc will have more details

Now the last part, DRS is disabled because under the hood, HA uses vMotion. Though vMotion is rapid and causes no outage for the users, MSCS heartbeat is very sensitive and may detect the few seconds of the stunning period as a node failure and consider that node to be down. This is certainly not what you want. Hence its best not to vMotion, which is why DRS is disabled as well.
Why is HA disabled? No one has been able to give a straight answer on that and it basically comes down to that its not supported.

As of now I really don’t know why you can’t have HA enabled for a VM that is part of a MSCS cluster.
The good news is, with 4 u1 and onwards, you can utilize the same hosts that are in a HA/DRS cluster to run your MSCS VMs, just don’t forget to disable these features for the VMs that are part of the MSCS cluster or else the VMware and MS support may stiff you in time of need.


VMware vSphere 4.1 HA and DRS technical deepdive — eBook

After waiting forever, Duncan and Denneman have finally released the eBook version of their famous “VMware vSphere 4.1 HA and DRS technical deepdive” book. I bought a hard copy last year but never got around to fully dedicating my time to reading it, mainly due to laziness. Now with it being an ebook, you have several new options to enjoy. As of now its being released on kindle only. iPad owners can install the kindle app and still enjoy this book on their superior tablets.

I love how you can simply click on the hyperlinks provided in the book now. Thanks Duncan Epping and Frank Denneman!. Without further due, please purchase your copy at VMware vSphere 4.1 HA and DRS technical deepdive for the dirt cheap price of $7.50. This will be the best investment you make in gaining more VMware knowledge. Enjoy!