To start, there is no issue with the Admission control really, it’s really our lack of understanding that makes it an issue. Last week I posted about Admission control and Duncan has a lot of excellent information on this subject. I just wanted to touch on a few things that I think are extremely important to understand in order to come up with appropriate designs for your environments.
So what is Admission Control (AC)? To state it in simple terms, it’s the policy that will save you from yourselves. Basically it’s a check that enables vCenter to reserve certain computing resources in your cluster so that an HA event can be accommodated. There are three different ways how this is done:
- Host failures cluster tolerates (this is where slot sizes are used)
- Percentage of cluster resources reserved as failover spare capacity (this is where you specify a percentage of resources you want reserved)
- Specify a failover host (this is self-explanatory so I will not be going over this)
Host failures cluster tolerates
We already know that slots can become an issue in a heterogeneous setup where you may have a couple of really large VMs and a bunch of small ones. Let’s imagine you have 100 VMs out of which 4 have 8vCPU and 24GB of memory reserved, everything else is 1 vCPU with no memory reservation. Unfortunately your 4 VMs will effect your slot size and your slot size will be huge due to the 4 large VMs you have. This basically means you will have fewer slots to power on more machines in a cluster. Of course you can tweak the advanced settings like das.slotCpuInMHz and/or das.slotMemInMB (credit:Duncan) to limit the size of your slot. While you will have more slots available in the cluster, but keep in mind your large VMs may occupy more than one slot now. So in order for them to power on, all the required number of slots should be available on a single host and not spread across the cluster. Just something to keep in mind.
The advantage of using this method is that its pretty dynamic and your available slots will increase and decrease automatically as hosts are added/removed or placed in maintenance mode. The one big issue with this method is how it handles unbalanced clusters. By that I mean if you have a humongous host in the cluster, HA AC will take the worst case into consideration and be prepared for the time when this huge host goes down. Good right? Well that also means that if all your hosts only contributed 50 slots to the cluster and this huge host owns a 100 by itself, in that case HA AC will only present the number of slots available for power on that accounts for the worst case scenario. So your 100 slot host will really not buy you much. Again, there is a whole lot already discussed regarding this by Duncan on his blog and in his books.
Percentage of cluster resources
This is really the area that I wanted to cover and this also happens to be the one I like. For a quick intro, please read this. I want to talk about how this method can affect your design. We already know that we specify a percentage of resources that are reserved by for an HA event. By default the pre-populated number is 25%. That sounds safe right?
And here is the answer that you have probably heard a million times before. It depends! Seriously, it depends on your cluster. For the purpose of this post and to keep things simple, we will assume that we are only dealing with balanced clusters. So let’s say you only have a 3 node cluster with each host having 64GB of memory and 10GHz of CPU. Your total is 192GB and 30GHz for the cluster. With taking the default 25%, you are really reserving 48GB and 7.5 GHz. This means when your cluster only has that many resources left, it will not allow you to power on anymore VMs. Is that good or bad? I don’t know only you can determine that.
Now let’s try to make our cluster a little bigger and considering we are still talking about vSphere 4, we will keep our cluster size to 8 (across two enclosures, 4 on each one) to make our primary nodes happy. So what happens in a cluster thats 8 node strong? 25% means, computing resources that equate to 2 of your nodes are just chilling and relaxing until something goes wrong. Of course this does NOT mean that 2 hosts are unused, it means computing resources that equate to what 2 nodes will provide are unused and reserved. Let’s take our example from before and extend that.
We now have 8 nodes each with 64GB ram and 10 GHz of CPU. This means our cluster has 512GB of ram and 80 GHz of cpu. But because our AC setting is set to 25%, we only have 384GB and 60GHz of cpu available for powering on VMs (please note during an HA event, HA will ignore all AC settings as stated here). That means we have reserved 128GB of ram and 20 GHz of CPU for HA. Is that cost-effective? I don’t know, but that should get someone’s attention.
Let’s take this even further and look at an example where we have 4 clusters spread 2 enclosures. Let’s use our same example from above and apply the numbers here. Each blade in the enclosure has 64GB of ram and 10 GHz of cpu. Again lets assume we have the AC set to 25%. As we discovered earlier we should have 128GB and 20 GHz reserved for an HA event per cluster. Across the four clusters in these two enclosures we will have 128*4 = 512GB of ram and 20*4=80 GHz of cpu reserved for an HA event. That is equivalent to the resources of 8 blades in this enclosure. Interestingly there are a total of 32 blades in this setup so 8 blades is 25% of the resources. AC isn’t doing anything wrong, it’s really doing what you asked it to do. You said you wanted 25% reserved, well there you go.
Now if you are not pulling your hair already, imagine if you have 4 of such enclosures. If we use the same setup that means we have 8 clusters and resources that equate to 16 blades are reserved for an HA event. Wait a sec, thats equal to 1 whole enclosure. Yes, and if you have 4 enclosures, 25% of that is 1 enclosure so there should be no surprise .
I want to clarify that your reserved resources can be fragmented across multiple hosts so all your blades are still probably handling some load. It’s just that the money you spent to provision new VMs may not materialize as you expected unless this was all factored in to begin with. So what if you have 10 of these enclosures setup the same way? That means two things:
- You have resources reserved that equate to 2.5 enclosure out of the total of 10 enclosures for an HA event
- And that you have way more money than I can ever imagine so kudos to you for that
One recommendation will be to set your percentage that equates to what you would have picked in the ‘Host failures cluster tolerates” method. If you wanted to use 1 there then set your percentage so that you are only reserving computing resources of one host. So in our 8 node cluster example, 12.5% would be equal to 1 host. Sine you can only enter integers here, let’s go with 13%. You started using percentages because you didn’t like the slots, that doesn’t mean you must have more resources reserved, That defeats the purpose. Lastly, unlike “Host failures cluster tolerates”, the percentage method will not dynamically adjust your percentage if you add or remove hosts. So if you add or remove hosts revisit the percentage again.
Also note, when you have large VMs in your cluster and because the reserved resources can be fragmented, it may not be a bad idea to have a higher restart priority for those large VMs. In simple terms, if your VM needs 24GB to start and your cluster has 24GB available but its spread across more than one host, guess what, it wont start. But since the 4.1 release HA will request DRS to make room but it’s not guaranteed.
The obvious question is does this get better with vSphere 5? Yes, it does but you will still have to figure out what works best for you. The one enhancements that’s visible in the gui is how you specify resource reservation for HA in vSphere.
You can specify CPU and memory individually . I think this is great if you are trying to make sure your dollars are not wasted and that you are not forced to reserve more than you really have to. This way you can reserve a certain amount of memory and a certain amount of CPU for your cluster. In previous versions, you didn’t have a way to differentiate the two, you could only put one percentage that applied to both prior to vSphere 5. But again, the defaults are 25% and one will have to figure out what will be the sweet spot in their environment.
In vSphere 4, virtual machines that did not have a reservation larger than 256Mhz a default of 256Mhz was used for CPU and if no reservation was used for memory either, a default of 0MB+memory overhead was used that contributed to determining the failover capacity of your cluster. In vSphere 5, the default of only 32MHz is used if no CPU reservation is defined and for no memory reservation, a default of 0MB+memory overhead is used like before to compute the failover capacity.
How is the failover capacity computed? I have written a detailed post on this subject here.
The Current CPU Failover Capacity is computed by subtracting the total CPU resource requirements from the total host CPU resources and dividing the result by the total host CPU resources. The Current Memory Failover Capacity is calculated similarly.
Of course because of the way HA works in vSphere 5, you are not limited to a 4 node per failure domain setup anymore as there are no primaries or secondary nodes in vSphere 5. In other words, if you wanted a 10 node cluster and you have two enclosures you could do 5 per enclosure and not have to worry about HA not functioning during an enclosure failure. I have discussed that in one of my earlier posts. But that’s out of the scope of whats being discussed here.
Going back to the original discussion, because you can now have more than 4 hosts in a failure domain in vSphere 5, lets say you have a 16 node cluster with 8 nodes on each enclosure (thank you vSphere 5 and FDM), and have your percentage set to 13% (for both CPU and memory) which is just a little over computing resources of two hosts (IMO this is still pretty liberal but it seems more practical), this means you are only reserving computing resources that equate to 2 hosts per cluster. Wait, isn’t that what happened before? Yes, but that was a smaller cluster and this happens to be twice as big. If we have two 16 node clusters we are reserving total computing resources that equate to about 4 hosts across those two 16 node clusters which is better than before. Of course going with larger clusters is another discussion out of the scope of this topic but I will say that DRS will be happier in a larger cluster. Keep in mind, if you take the default of 25% even in this large cluster of 16 nodes, you will still be screwed as that would mean resources that equate to 4 hosts will be reserved per cluster, so you will have the same old issue discussed above. So, be mindful of what percentage you place here. vSphere 5 gives you more flexibility as you can now put different values for CPU and memory.
Admission control is an awesome thing. You should absolutely turn it on so that it can do what needs to be done. However, it’s important for us to understand how it works. I know a few gigs of reserved capacity in my lab annoys me from time to time, but I know what its there for and how it would benefit me. If I had an enormous amount of computing resources reserved for HA like in the example above, I would be a little alarmed. Of course there might be a good reason for one to run that kind of setup, who knows. But if that person is you please consider this post as a request to donate your hardware to me when it comes time for you to upgrade .