HA Admission control – Percentage of cluster resources

I am sure we are all aware of why HA is all important and awesome to have. It helps you to finish your coffee, smoke your cigarette before rushing towards a server that just went down. Ok maybe not that but you get the idea right. Another thing to keep in mind regarding HA is the admission control policy. I like to call this the policy that saves you from yourself. Basically it keeps check of how many resources are available and how many will be needed for a failover to happen. It keeps you honest and ensures that the HA’s promise is not broken.

As we already know there are three types of Admission Control Policies to choose from:

  • Host failures cluster  tolerates
  • Percentage of cluster resources reserved as failover spare capacity
  • Specify a failover host
“Host failure cluster tolerates” creates slots which at times could create issues specially if you only have a a few VMs with High CPU counts and memory reservation. Of course you can look at advanced settings that could address this and Duncan can tell you all there is to know about this. The second option which is selecting a percentage of resources is my personal favorite specially due to the flexibility that it provides. We will go over that in a little bit. The last option which lets you specify a failover host is the one thats rarely used and rightly so. After all why would you want a host to just sit there and wait until something goes wrong?
As you may have already noticed, vSphere 5 gives you the option to specify a percentage of failover resources for both CPU and memory. Prior to vSphere 5, this was not the case. I think this is an excellent addition and our clusters will now be more flexible then ever.

25% is whats placed in there by default and what this really means is the 25% of your total CPU and total memory resource across the entire cluster is reserved for your cluster. So in other words, if you have an 8 node cluster, 25% of your resources or resources equal to two host (assuming its a balanced cluster) are reserved for an HA incident. If this happens to be a 32 node cluster and if this is a balanced cluster, resources that equate to 8 nodes will be reserved as 8 is 25% of 32. So keep that in mind before deciding what number to put there. You can’t reserve more than 50% of your resources.

Below is how the resources are calculated for the hosts:

The total host resources available for virtual machines is calculated by adding the hosts’ CPU and memory resources. These amounts are those contained in the host’s root resource pool, not the total physical resources of the host. Resources being used for virtualization purposes are not included. Only hosts that are connected, not in maintenance mode, and have no vSphere HA errors are considered.

So how do you know how much head room do yo have left in the cluster? On your cluster summary tab, you will notice there is no longer a place for you to look at slot size as this method does not use slot sizes. It basically gives you a simple view of how much room you have left.

The Current CPU Failover Capacity is computed by subtracting the total CPU resource requirements from the total host CPU resources and dividing the result by the total host CPU resources. The Current Memory Failover Capacity is calculated similarly.

In vSphere 5, vSphere HA uses the actual reservations of the virtual machines. If a virtual machine does not have reservations, meaning that the reservation is 0, a default of 0MB memory and 32MHz CPU is applied.

So assuming you went with the default of 25% for each resource, 0% as current failover capacity is something you should hope never to see. You are seeing that in my screenshot (above) because my cluster happens to be empty and has no hosts. Lets, say you went ahead and turned on a few VMs and your cluster shows something like below, (98% CPU and 95% memory), this is something to be happy about. This basically means you have 98% of CPU available and 95% of memory available in your cluster.

There is one thing to keep in mind, though 98% of my CPU and 95% of my memory appear under my current failover capacity, this does not account for the 25% of whats reserved for an HA incident. At least thats what I was able to see by the few tests that I ran. What this means is that I can only power on VMs that account for no more than 98-25 = 73% of CPU and 95-25=70% of memory thats free in the cluster. For everything else HA should try to save me from myself.

Let’s look at a quick example to see how these numbers are calculated:

  • The Configured Failover Capacity is set to 25% for both CPU and memory.
  • Cluster is comprised of three hosts, each with 9GHz and 24GB of memory.
  • There are 4 powered-on virtual machines in the cluster with the following configs (assume overhead is 100mb for all VMs in this case):
    • VM1 needs 2GHz and 1GB (no reservation)
    • VM2 needs 2GHz and 2GB (2GB reserved)
    • VM3 needs 1GHz and 2GB (2GB reserved)
    • VM4 needs 3GHz and 6GB (1GHz and 2GB reserved)

So what does our cluster have? Our cluster has 9GHz+9GHz+9GHz = 27GHz of CPU and 24GB+24GB+24Gb=72GB of memory. (These amounts are those contained in the host’s root resource pool, not the total physical resources of the host).

How much resources are we using with our four VMs that are powered on?

Memory = VM reservation + overhead = 0+100+2048+100+2048+100+2048+100= 6544MB = 6.4GB

Note we only used 2048 for VM4 even though it had 6GB configured. Thats because it only had 2GB reserved. Also, VM1 had no reservation so only overhead was used.

CPU = If no reservation use 32MHz for vSphere 5 = 32MHz+32MHz+32MHz+1GHz= 1.096GHz

So what is our current failover capacity?

Memory = (72GB – 6.4Gb)/72= 91%

CPU = (27GHz-1.096GHz)/27= 95.94%=96%

Wow, that is a lot of cluster resources left. Now lets take 25% off from our numbers to come up with exactly how many VMs can we power on before HA starts screaming back with an error.

Memory = 91- 25 = 66%

CPU = 96-25 = 71%

Now keep in mind, selecting the percentage for admission control policy isn’t going to solve all your problems. But I do think that this setting is far better than complex slot sizes and what not. This gives one a simple view of how much room you have in your cluster without messing around with slot sizes. However, unlike cluster host tolerates setting where you can simply add hosts like crazy, using the percentage method may require you to revisit your percentages as you add or remove hosts. At the same time it also gives you more flexibility. So next time you are setting a cluster, think about whats important to you.


VMware vSphere 4.1 HA and DRS technical deepdive — eBook

After waiting forever, Duncan and Denneman have finally released the eBook version of their famous “VMware vSphere 4.1 HA and DRS technical deepdive” book. I bought a hard copy last year but never got around to fully dedicating my time to reading it, mainly due to laziness. Now with it being an ebook, you have several new options to enjoy. As of now its being released on kindle only. iPad owners can install the kindle app and still enjoy this book on their superior tablets.

I love how you can simply click on the hyperlinks provided in the book now. Thanks Duncan Epping and Frank Denneman!. Without further due, please purchase your copy at VMware vSphere 4.1 HA and DRS technical deepdive for the dirt cheap price of $7.50. This will be the best investment you make in gaining more VMware knowledge. Enjoy!



High Availability and Primary Hosts

One of the things that is often overlooked is the complexities involved in making HA work. Because of its not so complex interface, we forget what happens in the background. When we add the first host to a cluster that has HA enabled, it becomes a primary host. Likewise the first 5 host that enter a cluster are all primary hosts. Any host beyond the 5th host is tagged as a secondary host.

The primary hosts are responsible for making sure there is peace and harmony. However, when all these 5 hosts go missing, HA stops working. This is why it’s crucial to understand what could cause these hosts to fail and what happens when we lose a primary.

When we put a primary host in maintenance mode, a secondary host is promoted to the primary status. What does this tell you? The roles would be moving around depending on how big your cluster is and how often you place your hosts in maintenance.

However when a primary host fails or power’s off, a secondary is not promoted. The cluster now only has 4 primaries. And if another primary fails, we have 3 primaries. I guess you get the picture. Basically you cannot afford to lose more than 4 primary hosts in a cluster. So having more than four hosts in a HA cluster racked in the same cabinet may not be the best idea. What if all 5 primary hosts reside in the same cabinet and the whole cabinet loses power? That’s right your monstrous HA cluster will not work.


Fault Tolerance – Blues

Vmware introduced a new groundbreaking technology with vSphere that promised to give IT professionals some sanity after work. However, I am personally more disappointed by how much fault tolerance really offers.

Basically Fault Tolerance is a feature that runs a shadow copy of a VM on a different hosts. If the host running the primary VM dies, the shadow VM becomes the primary VM and starts another shadow VM on a differnt host. It sounds wonderful, but there are certain things that are often overlooked. According to VMware, “the Primary VM captures all nondeterministic events and sends them across a VMware FT logging network to the Secondary VM. The Secondary VM receives and then replays those nondeterministic events in the same sequence as the Primary VM, typically with a very small lag time.   As both the Primary and Secondary VMs execute the same instruction sequence, both initiate I/O operations. However, the outputs of the Primary VM are the only ones that take effect: disk writes are committed, network packets are transmitted, and so on. All outputs of the Secondary VM are suppressed by ESX. Thus, only a single virtual machine instance appears to the outside world.”

What is the cost?

  • You have to have vSphere Advanced at a minimum.
  • FT requires that the hosts for the Primary and Secondary VMs use the same CPU model, family, and stepping. Approved CPU list:
    • Intel :3100 Series, 3300 Series, 5200 Series (DP), 5400 Series, 7400 Series, 3400 Series (Lynnfield), 3500 Series, 5500 Series
    • AMD: 1300 and 1400 Series, 2300 and 2400 Series (DP), 8300 and 8400 Series (MP)
  • A dedicated Fault Tolerance Network (vSwitch and connectivity). Each host must have a VMotion and a Fault Tolerance Logging NIC configured. The VMotion and FT logging NICs must be on different subnets
  • The primary and secondary ESX hosts and virtual machines have to be in an HA-enabled cluster
  • Ensure that there is no requirement to use DRS for VMware FT protected virtual machines
  • Hosts that are running the primary and the secondary VM are on the same ESX/ESXI build
  • VMware Consolidated Backup (VCB) is not supported with Fault Tolerance
  • Different operating systems react and behave differently on the different approved CPUs. The following link provides a good bit of detail.http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008027
  • Virtual machines must have thick-eager zeroed disks
  • Storage VMotion is not supported for VMware FT VMs
  • Snapshots are not supported for VMware FT protected VMs
  • Ensure that the virtual machines are NOT using more than 1 vCPU

There are more restrictions and requirements but these are some of the generic once. One point to be noted here is that if the primary VM experiences a blue screen, so will the secondary VM. So in essence, FT really provides protection from HW failure of host. If all the requirements don’t turn you off and all the restrictions don’t make you upset, then go for it. I am personally not impressed with this technology yet or maybe I haven’t found a need to implement it yet. Or maybe I am spoiled by the higher standards VMware has gotten me used to.