I am sure we are all aware of why HA is all important and awesome to have. It helps you to finish your coffee, smoke your cigarette before rushing towards a server that just went down. Ok maybe not that but you get the idea right. Another thing to keep in mind regarding HA is the admission control policy. I like to call this the policy that saves you from yourself. Basically it keeps check of how many resources are available and how many will be needed for a failover to happen. It keeps you honest and ensures that the HA’s promise is not broken.
As we already know there are three types of Admission Control Policies to choose from:
- Host failures cluster tolerates
- Percentage of cluster resources reserved as failover spare capacity
- Specify a failover host
25% is whats placed in there by default and what this really means is the 25% of your total CPU and total memory resource across the entire cluster is reserved for your cluster. So in other words, if you have an 8 node cluster, 25% of your resources or resources equal to two host (assuming its a balanced cluster) are reserved for an HA incident. If this happens to be a 32 node cluster and if this is a balanced cluster, resources that equate to 8 nodes will be reserved as 8 is 25% of 32. So keep that in mind before deciding what number to put there. You can’t reserve more than 50% of your resources.
Below is how the resources are calculated for the hosts:
The total host resources available for virtual machines is calculated by adding the hosts’ CPU and memory resources. These amounts are those contained in the host’s root resource pool, not the total physical resources of the host. Resources being used for virtualization purposes are not included. Only hosts that are connected, not in maintenance mode, and have no vSphere HA errors are considered.
So how do you know how much head room do yo have left in the cluster? On your cluster summary tab, you will notice there is no longer a place for you to look at slot size as this method does not use slot sizes. It basically gives you a simple view of how much room you have left.
The Current CPU Failover Capacity is computed by subtracting the total CPU resource requirements from the total host CPU resources and dividing the result by the total host CPU resources. The Current Memory Failover Capacity is calculated similarly.
In vSphere 5, vSphere HA uses the actual reservations of the virtual machines. If a virtual machine does not have reservations, meaning that the reservation is 0, a default of 0MB memory and 32MHz CPU is applied.
So assuming you went with the default of 25% for each resource, 0% as current failover capacity is something you should hope never to see. You are seeing that in my screenshot (above) because my cluster happens to be empty and has no hosts. Lets, say you went ahead and turned on a few VMs and your cluster shows something like below, (98% CPU and 95% memory), this is something to be happy about. This basically means you have 98% of CPU available and 95% of memory available in your cluster.
There is one thing to keep in mind, though 98% of my CPU and 95% of my memory appear under my current failover capacity, this does not account for the 25% of whats reserved for an HA incident. At least thats what I was able to see by the few tests that I ran. What this means is that I can only power on VMs that account for no more than 98-25 = 73% of CPU and 95-25=70% of memory thats free in the cluster. For everything else HA should try to save me from myself.
Let’s look at a quick example to see how these numbers are calculated:
- The Configured Failover Capacity is set to 25% for both CPU and memory.
- Cluster is comprised of three hosts, each with 9GHz and 24GB of memory.
- There are 4 powered-on virtual machines in the cluster with the following configs (assume overhead is 100mb for all VMs in this case):
- VM1 needs 2GHz and 1GB (no reservation)
- VM2 needs 2GHz and 2GB (2GB reserved)
- VM3 needs 1GHz and 2GB (2GB reserved)
- VM4 needs 3GHz and 6GB (1GHz and 2GB reserved)
So what does our cluster have? Our cluster has 9GHz+9GHz+9GHz = 27GHz of CPU and 24GB+24GB+24Gb=72GB of memory. (These amounts are those contained in the host’s root resource pool, not the total physical resources of the host).
How much resources are we using with our four VMs that are powered on?
Memory = VM reservation + overhead = 0+100+2048+100+2048+100+2048+100= 6544MB = 6.4GB
Note we only used 2048 for VM4 even though it had 6GB configured. Thats because it only had 2GB reserved. Also, VM1 had no reservation so only overhead was used.
CPU = If no reservation use 32MHz for vSphere 5 = 32MHz+32MHz+32MHz+1GHz= 1.096GHz
So what is our current failover capacity?
Memory = (72GB – 6.4Gb)/72= 91%
CPU = (27GHz-1.096GHz)/27= 95.94%=96%
Wow, that is a lot of cluster resources left. Now lets take 25% off from our numbers to come up with exactly how many VMs can we power on before HA starts screaming back with an error.
Memory = 91- 25 = 66%
CPU = 96-25 = 71%
Now keep in mind, selecting the percentage for admission control policy isn’t going to solve all your problems. But I do think that this setting is far better than complex slot sizes and what not. This gives one a simple view of how much room you have in your cluster without messing around with slot sizes. However, unlike cluster host tolerates setting where you can simply add hosts like crazy, using the percentage method may require you to revisit your percentages as you add or remove hosts. At the same time it also gives you more flexibility. So next time you are setting a cluster, think about whats important to you.