HA and Admission Control

I have seen admission control being used without really understanding how it impacts your cluster and your available resources. While configuring admission control on a cluster the other day, I started thinking how this really works. The concept is pretty simple. According to VMware:

Slot size is comprised of two components, CPU and memory. VMware HA calculates these values.

The CPU component by obtaining the CPU reservation of each powered-on virtual machine and selectingthe largest value. If you have not specified a CPU reservation for a virtual machine, it is assigned a defaultvalue of 256 MHz (this value can be changed using the das.vmCpuMinMHz advanced attribute.)

The memory component by obtaining the memory reservation (plus memory overhead) of each poweredon virtual machine and selecting the largest value

HA relies on slot sizes and in the current version of ESX/i, if no reservations are used, the default slot sizes are 256 MHz and the memory overhead. Now keep in mind, if you happen to have a VM which has a reservation of 4GB, now all of a sudden your slot size has become 256 MHz and 4GB in memory. Basically now you have less slots to place your VMs and admission control will make it to where you can’t power on more VMs than what can be accommodated according to your host failures cluster tolerates setting. Basically HA will look for your worst case CPU and memory reservation to come up with the slot size. All that I just mentioned should be common knowledge.

Let’s assume you have a cluster of 3 hosts and VMs with no reservation, HA is turned on, host failures cluster tolerates is 1, admission control is enabled and your isolation response is set to shutdown. For simplifying things lets assume your cluster is balanced where each hosts has 10GHz CPU and 24GB of memory. Your cluster has a total of 30GHz CPU and 72GB of memory. The total number of VMs running is 60 and none of them have any reservation. Lets also assume your slot size is 256 MHz and 300MB (overhead). So how many slots do you have? You have 30000/256 = 117 in CPU and 72000/300 = 240 in memory. You always pick the lowest number and according to what we calculated above, you have 117 slots available on this cluster.

Let’s assume a host fails and now we only have 20GHz and 48GB left in our cluster. We now have 20000/256 = 78 and 48000/300= 160, which means we have only 78 slots available now. So you have 78 slots and 60 VMs (1 VM/slot), should all your VMs power on? No, because your cluster still has Host Failures Cluster Tolerates set to 1 and admission control is enabled. It’s important to understand how admission control really works. According to VMware:

With the Host Failures Cluster Tolerates policy, VMware HA performs admission control in the following way:

1 Calculates the slot size.A slot is a logical representation of the memory and CPU resources that satisfy the requirements for any powered-on virtual machine in the cluster.

2 Determines how many slots each host in the cluster can hold.

3 Determines the Current Failover Capacity of the cluster.This is the number of hosts that can fail and still leave enough slots to satisfy all of the powered-on virtual machines.

4 Determines whether the Current Failover Capacity is less than the Configured Failover Capacity (provided by the user).If it is, admission control disallows the operation.

So according to that, even though your cluster has enough slots to run all your VMs, but because your host failures cluster tolerates is set to 1, admission control has to make sure it only runs the load it can afford to run in case of another host failure. Basically admission control knows there are 78 slots available but it has to keep in mind that in case of another host failure it will only have 39. Because host failures cluster tolerates is set to 1, admission control will only allow 39 slots to be accommodated. So once HA realizes that 39 slots have been taken, it will not allow anymore power on. It’s saving you from yourself.

I will not throw in other complications like memory reservations or an unbalanced cluster (hosts with different resources) and how to handle that yet just to keep it simple. I do plan to post about why reservation would be a bad idea at the VM level and ways to get around the conservative slot sizes. HA and admission control are awesome tools to have, but if you don’t plan intelligently, you will soon begin to hate them.

Memory state in ESXTOP/RESXTOP

Often times, you will question if you have enough room for another VM on your host. Now before I begin, let me clarify in a larger environment, you should certainly use capacity analysis tools. But what if you are a small shop and can’t afford one of those tools and you are only an owner of a small cluster and dont mind running ESXTOP/RESXTOP to figure this out. You can look at TPS and other areas but the memory state of the host will indicate the kind of the stress this host is under. This will be your best friend.


As you can tell my host is in the ‘High” state. What does this really mean? Your host can be in one of the following states: “high”, “soft”, “hard” or “low”. Your host will be in either one of these states based on the following:

high state = if the free memory is greater than or equal to 6%

soft state = if the free memory is at 4%

hard state = if the free memory is at 2%

low state = if the free memory is at 1%

As you can tell, high state is what will keep your host happy. One thing to note is in the high and soft states, ballooning is favored over swapping, in hard and low states, swapping is favored over ballooning. Of course TPS and other techniques will enable you to efficiently use the memory on your host and allow you to overcommit.  Another thing to point out is that your host maybe in ‘high’ state but you may notice your VM is still swapping. It’s not  the host, its really the limit on your VM or your RP settings that is causing this VM to swap.

The good news is that DRS will move your VM over to another host (based on your setting) if its gets under stress and moving a VM will guarantee to better its performance. But I have always found ESXTOP/RESXTOP to be an excellent tool to get an insight on whats really happening on your host. Remember a holistic view is great, and when we talk about a cloud a single host may not mean much. However, each host is a building block that forms your cloud. Understanding how memory is handled on a host level will give you better insight on the holistic stats of memory in your cloud.

Optimize performance of Win VMs using RSS in vSphere

Recently VMware published a new white paper about network performance for vSphere 4.1. Duncan posted a link to it on his blog and I decided to take a look at what it had to offer. Without a doubt, it had some very useful information and most importantly its an easy read. So I recommend you read the white paper as well. Along with the possibility of a Linux VM receiving packets at 27Gbps, I thought the take on Windows VM was very interesting.

As mentioned in the white paper, the Linux VMs performed better than the Windows VMs as they leveraged the Large Receive Offload (LRO) which is not available for Windows. This started to make me think about some of the issues that could be addressed just by having a simple understanding of what this means. A VM that does not support LRO, its receiving packets are processed by the vCPU that it has been assigned. One important thing to note is that by default, only vCPU 0 will be used by your Windows VM. This means that even if you have a VM that has been assigned 8 vCPUs, when it comes to processing received packets, that task will only be handled by vCPU 0 as the other 7 sit back and relax. Basically your VM will still wait for it to schedule all the vCPUs before it does anything, however, when all the vCPUs have been engaged, only one will do the job.

As mentioned in the white paper as well, what you can do is enable Receive-Side Scaling (RSS) and this enables the windows VM to utilize all its assigned vCPUs when processing received packets. Your VM will wait to schedule all the vCPUs assigned, why not make use of all of them while you have ’em. This can enhance your VMs performance. Not to mention multiple vCPU should only be assigned to a VM if the application supports it and assigning multiple vCPUs will enhance the VMs performance. In a highly taxed host, a VM with multiple vCPUs for no reason will only suffer.

In a non RSS enabled windows VM where you see a spike in processor due to network utilization, you will notice adding another vCPU doesn’t solve your issue. What might happen is that if your single vCPU VM was at a 100% CPU utilization, now it will be at 50%. If you increase the vCPUs to 4, now the utilization will only be about 25%. But the performance is still the same. Whats going on? Only 1 vCPU is doing all the processing for received packets. ESXTOP will solve the mystry for you as well. By enabling RSS on this VM, you can benefit from using all the vCPUs assigned. Again, be sure that assigning more vCPUs is not causing scheduling issues in your environment. That will depend on how busy your host is.

You can find out if RSS is enabled by running netsh int tcp show global in the command line and it will show you the status.

rss enabled

Its enabled by default on Windows 2008 R2 and can be enabled on windows 2003 sp2 and up. You will also have to enable RSS in the driver setting for the VMXNET3 inside your VM and you are all set. You will need to use VMXNET3 to enable RSS, VMXNET2 will not cut it. Simple things like this can certainly assist in optimizing your environment and put you at ease with what lives in your cloud.

vCD + vCO +AD What does this mean?

We are all aware of the VMware vCenter Orchestrator (vCO) and how it enables the automation of so many tasks in vCenter that could become tedious and boring after a while. In February 2011, VMware released the vCO plugin for vCloud Director aka vCloud aka vCD. How will this help you? Think about the process involved in creating an organization, virtual datacenter, users, networks etc. With vCO, you can create workflows to automate these tasks so that the ever shrinking and often under staffed IT department can focus on bigger and better things. vCO for vCD expanded the automation to your vCD environments which is slowly becoming the standard to start your own cloud.

Today, VMware has released vCO plugin for Active Directory. Some of you must be thinking whats the big deal? With this plug-in, you can enhance the level of automation you had before and provide better service to your customers. Tasks like creating AD accounts, deleting AD account, groups, memberships can all be automated by leveraging vCO workflows. It comes with 32 preconfigured workflows and you can create more as needed.  To benefit from this, you will need  to have a vCO setup along with your vCD. Not to mention you will need to have the underlying vSphere environment in  place.

Lets think of what this could do when all the pieces line up. The customer logs into your portal and creates a request. vCO creates the organization/virtual datacenter/network/ bla bla bla for the user. The user then creates a vApp that consists of two VMs in the vDC. Now with the AD plugin for vCO, the workflow for creating computer accounts for these VMs can be leveraged. Oh and did I mention the new organization that was created will now have the users that were also created in AD using a workflow. Next thing you know a customer decides  to leave, now a workflow can handle that task as well. You dont have to go and delete the child objects in the organization like you did before. Your workflow will delete all the child objects and finally your organization as well. Now your resources are available to be purchased again. And while you are at it, you can also clean up AD with the accounts that are no longer needed using the AD plugin for vCO.

To sum it up, automation is the name of the game. You dont have to be a hardcore developer to leverage vCO. If I can do it, anyone can. The only way to handle the ever increasing demand is by automating tasks that can be automated. It even decreases the room for error and helps to divert your time into bigger and better things besides creating a VM, adding a portgroup etc. Lastly, automation is also extremely important in ensuring your cloud meets the standard definition of what a cloud should be. Trust me, there are just too many definitions out there, and automation seems to be the common denominator.

You can download the AD plugin for vCO here.

vSphere client for iPad (Review)

I was too excited about getting the iPad2 this year and one of the first things I started looking for was the vSphere client that VMware was supposed to make for the iPad. After standing in line and with the help of my friend, I was finally able to get my hands on Apple’s new tablet. For the next two days I religiously searched for the vSphere client for the iPad but was disappointed not to find it. Just this past Sunday, I was talking to a friend who asked me if I tried out the iPad app for vSphere. So I started searching again and it turns out I gave up searching 3-4 days before it was finally released (March 17th, 2011). After feeling left out, I finally downloaded it and took it for a spin.

You will need to download the vCMA, vSphere Client for iPad and off course a vSphere environment and an iPad will be needed. Once you have fired up your vCMA, be sure to change your password for the vCMA appliance. This is not a requirement, but if you plan on allowing remote access to your vCMA appliance, you may not want to leave it with the default password that is known by the masses. You can manage your vCMA appliance at, http://YourIP:5480. I would also assign the vCMA a static IP.

Once you have assigned the IP to vCMA, go to the settings in your iPad and tap on the “vSphere Client” and enter the IP of your vCMA in the “Web Server” field.  Read the rest of this entry »