HA and Admission Control

I have seen admission control being used without really understanding how it impacts your cluster and your available resources. While configuring admission control on a cluster the other day, I started thinking how this really works. The concept is pretty simple. According to VMware:

Slot size is comprised of two components, CPU and memory. VMware HA calculates these values.

The CPU component by obtaining the CPU reservation of each powered-on virtual machine and selectingthe largest value. If you have not specified a CPU reservation for a virtual machine, it is assigned a defaultvalue of 256 MHz (this value can be changed using the das.vmCpuMinMHz advanced attribute.)

The memory component by obtaining the memory reservation (plus memory overhead) of each poweredon virtual machine and selecting the largest value

HA relies on slot sizes and in the current version of ESX/i, if no reservations are used, the default slot sizes are 256 MHz and the memory overhead. Now keep in mind, if you happen to have a VM which has a reservation of 4GB, now all of a sudden your slot size has become 256 MHz and 4GB in memory. Basically now you have less slots to place your VMs and admission control will make it to where you can’t power on more VMs than what can be accommodated according to your host failures cluster tolerates setting. Basically HA will look for your worst case CPU and memory reservation to come up with the slot size. All that I just mentioned should be common knowledge.

Let’s assume you have a cluster of 3 hosts and VMs with no reservation, HA is turned on, host failures cluster tolerates is 1, admission control is enabled and your isolation response is set to shutdown. For simplifying things lets assume your cluster is balanced where each hosts has 10GHz CPU and 24GB of memory. Your cluster has a total of 30GHz CPU and 72GB of memory. The total number of VMs running is 60 and none of them have any reservation. Lets also assume your slot size is 256 MHz and 300MB (overhead). So how many slots do you have? You have 30000/256 = 117 in CPU and 72000/300 = 240 in memory. You always pick the lowest number and according to what we calculated above, you have 117 slots available on this cluster.

Let’s assume a host fails and now we only have 20GHz and 48GB left in our cluster. We now have 20000/256 = 78 and 48000/300= 160, which means we have only 78 slots available now. So you have 78 slots and 60 VMs (1 VM/slot), should all your VMs power on? No, because your cluster still has Host Failures Cluster Tolerates set to 1 and admission control is enabled. It’s important to understand how admission control really works. According to VMware:

With the Host Failures Cluster Tolerates policy, VMware HA performs admission control in the following way:

1 Calculates the slot size.A slot is a logical representation of the memory and CPU resources that satisfy the requirements for any powered-on virtual machine in the cluster.

2 Determines how many slots each host in the cluster can hold.

3 Determines the Current Failover Capacity of the cluster.This is the number of hosts that can fail and still leave enough slots to satisfy all of the powered-on virtual machines.

4 Determines whether the Current Failover Capacity is less than the Configured Failover Capacity (provided by the user).If it is, admission control disallows the operation.

So according to that, even though your cluster has enough slots to run all your VMs, but because your host failures cluster tolerates is set to 1, admission control has to make sure it only runs the load it can afford to run in case of another host failure. Basically admission control knows there are 78 slots available but it has to keep in mind that in case of another host failure it will only have 39. Because host failures cluster tolerates is set to 1, admission control will only allow 39 slots to be accommodated. So once HA realizes that 39 slots have been taken, it will not allow anymore power on. It’s saving you from yourself.

I will not throw in other complications like memory reservations or an unbalanced cluster (hosts with different resources) and how to handle that yet just to keep it simple. I do plan to post about why reservation would be a bad idea at the VM level and ways to get around the conservative slot sizes. HA and admission control are awesome tools to have, but if you don’t plan intelligently, you will soon begin to hate them.

Memory state in ESXTOP/RESXTOP

Often times, you will question if you have enough room for another VM on your host. Now before I begin, let me clarify in a larger environment, you should certainly use capacity analysis tools. But what if you are a small shop and can’t afford one of those tools and you are only an owner of a small cluster and dont mind running ESXTOP/RESXTOP to figure this out. You can look at TPS and other areas but the memory state of the host will indicate the kind of the stress this host is under. This will be your best friend.

MemoryState

As you can tell my host is in the ‘High” state. What does this really mean? Your host can be in one of the following states: “high”, “soft”, “hard” or “low”. Your host will be in either one of these states based on the following:

high state = if the free memory is greater than or equal to 6%

soft state = if the free memory is at 4%

hard state = if the free memory is at 2%

low state = if the free memory is at 1%

As you can tell, high state is what will keep your host happy. One thing to note is in the high and soft states, ballooning is favored over swapping, in hard and low states, swapping is favored over ballooning. Of course TPS and other techniques will enable you to efficiently use the memory on your host and allow you to overcommit.  Another thing to point out is that your host maybe in ‘high’ state but you may notice your VM is still swapping. It’s not  the host, its really the limit on your VM or your RP settings that is causing this VM to swap.

The good news is that DRS will move your VM over to another host (based on your setting) if its gets under stress and moving a VM will guarantee to better its performance. But I have always found ESXTOP/RESXTOP to be an excellent tool to get an insight on whats really happening on your host. Remember a holistic view is great, and when we talk about a cloud a single host may not mean much. However, each host is a building block that forms your cloud. Understanding how memory is handled on a host level will give you better insight on the holistic stats of memory in your cloud.

Backup/Restore config for ESXi using vMA

With the rise of ESXi I have often found myself simply redeploying the hypervisor in order to overcome an issue. Why? Because it literally takes minutes to reinstall ESXi and when you have a big customer being effected by an outage, your first priority is to bring the environment up. Yeah you still have HA and what not, but you still need to bring all the hosts in the cluster up and functional as soon as possible so that you dont find yourself in a situation where you have lost another host. Yes, I am all for finding the root cause but sometimes my curiosity is not as important as the customer’s application. In reality it never is. Plus what if your cluster just happens to be a two host cluster (dont ask me why, but I have seen that as well), in this case bringing the other host up is of paramount importance unless you dont mind being caught with pants down.

So let’s say you reinstall ESXi and that really takes a few minutes. However their might be things you do to customize your hosts like licensing info, multipaths, local users, switchs etc etc. I think you get the idea. Soon you will realize that its really stuff like this what ends up consuming your time. So what you do? In comes vMA. You can use vMA to backup your host configuration to a file and use that file to restore the changes you made to the host. This will save you plenty of time. Follow these simple steps.

Backup:

Login into  vMA and set the vifptarget to the server you want to back up. To do that run:

vifptarget -s servername/ip (This is assuming you have already added the server in question using the vifp addserver command — fast pass)

Once the target has been set

vicfg-cfgbackup -s /location/BackupFileName

Your host configuration is now saved.

If you dont want to do a vifp addserver and then set the target to this host in vMA for whatever reason, you can also run:

vicfg-cfgbackup -s -server IPaddrOfHost /location/BackupFileName (This command will prompt you for a username and pwd as the vMA appliance doesn’t have this. Note the -s stands for save).

You can then FTP or SCP into the vMA appliance and grab this configuration file and store it in a safe/accessible location. Notice I said “accessible” because its important for you to have backups, but these backups are not of any value if you can’t get to them when you need ’em.

Restore:

You can restore the configuration using the following command from your vMA appliance:

vicfg-cfgbackup -l -server IPaddrOfHost /location/BackupFileName (Note the -l stands for load)

You will be prompted for a username and password and you will also have to type YES as the prompt will say.

Keep in mind you will have to reboot your host after the restore completes.

Some use cases:

Let’s say you accidentally deleted a vswitch on your host. Instead of trying to replicate what another host has and increase your chances of missing a thing or two (maybe promiscuous mode was enabled as you run an IPS device on this host/cluster), you can simply use the above command to restore the backup for this particular host, restart your host and you are all set.

You decided to reinstall ESXi on the host. You will notice once you begin that process your host in vCenter will now be down. As you complete the install for ESXi and assign your host the static IP addr, you can then restore the config using the above commands and reboot your host. Log back into the vCenter and try to reconnect the host. The agent will get redeployed on this host and once that is completed, you will notice that all your settings are in place and DRS can happily move VMs on this host.

You can probably use powercli/vcli to do the same. My personal favorite is vMA although I use powercli/vcli for other things.

Why get vSphere Ent. Plus with vCD?

I have often seen customers who purchase vCD licenses in hopes from setting up a private cloud in their environment without realizing the need for  purchasing vSphere Ent. plus licenses. You dont really need the Ent. Plus license. However, you will loose a certain level of automation if you dont. Why? Only Ent. Plus license gives you the ability to use distributed switches. According to VMware, “Virtual Distributed Switches (vDS) are supported and recommended by VMware. vDS provides full automation.

You will not be able to use vCloud Network Isolation, dynamic provisioning of network pools will not be possible and port-groups will have to be manually created on all the hosts if vDS are not engaged. These  are just some of the drawbacks of using std. switches in hosts that are a part of a vCloud environment. I can’t stress enough on the importance of some of the above mentioned functionalities that really make vCD beautiful.  If you don’t believe in automating those tasks, then you can get away with a std. switch, but keep in mind you wont benefiting from the fruits of vCloud completely. So, just get enterprise plus license.

Optimize performance of Win VMs using RSS in vSphere

Recently VMware published a new white paper about network performance for vSphere 4.1. Duncan posted a link to it on his blog and I decided to take a look at what it had to offer. Without a doubt, it had some very useful information and most importantly its an easy read. So I recommend you read the white paper as well. Along with the possibility of a Linux VM receiving packets at 27Gbps, I thought the take on Windows VM was very interesting.

As mentioned in the white paper, the Linux VMs performed better than the Windows VMs as they leveraged the Large Receive Offload (LRO) which is not available for Windows. This started to make me think about some of the issues that could be addressed just by having a simple understanding of what this means. A VM that does not support LRO, its receiving packets are processed by the vCPU that it has been assigned. One important thing to note is that by default, only vCPU 0 will be used by your Windows VM. This means that even if you have a VM that has been assigned 8 vCPUs, when it comes to processing received packets, that task will only be handled by vCPU 0 as the other 7 sit back and relax. Basically your VM will still wait for it to schedule all the vCPUs before it does anything, however, when all the vCPUs have been engaged, only one will do the job.

As mentioned in the white paper as well, what you can do is enable Receive-Side Scaling (RSS) and this enables the windows VM to utilize all its assigned vCPUs when processing received packets. Your VM will wait to schedule all the vCPUs assigned, why not make use of all of them while you have ’em. This can enhance your VMs performance. Not to mention multiple vCPU should only be assigned to a VM if the application supports it and assigning multiple vCPUs will enhance the VMs performance. In a highly taxed host, a VM with multiple vCPUs for no reason will only suffer.

In a non RSS enabled windows VM where you see a spike in processor due to network utilization, you will notice adding another vCPU doesn’t solve your issue. What might happen is that if your single vCPU VM was at a 100% CPU utilization, now it will be at 50%. If you increase the vCPUs to 4, now the utilization will only be about 25%. But the performance is still the same. Whats going on? Only 1 vCPU is doing all the processing for received packets. ESXTOP will solve the mystry for you as well. By enabling RSS on this VM, you can benefit from using all the vCPUs assigned. Again, be sure that assigning more vCPUs is not causing scheduling issues in your environment. That will depend on how busy your host is.

You can find out if RSS is enabled by running netsh int tcp show global in the command line and it will show you the status.

rss enabled

Its enabled by default on Windows 2008 R2 and can be enabled on windows 2003 sp2 and up. You will also have to enable RSS in the driver setting for the VMXNET3 inside your VM and you are all set. You will need to use VMXNET3 to enable RSS, VMXNET2 will not cut it. Simple things like this can certainly assist in optimizing your environment and put you at ease with what lives in your cloud.