Optimize performance of Win VMs using RSS in vSphere

Recently VMware published a new white paper about network performance for vSphere 4.1. Duncan posted a link to it on his blog and I decided to take a look at what it had to offer. Without a doubt, it had some very useful information and most importantly its an easy read. So I recommend you read the white paper as well. Along with the possibility of a Linux VM receiving packets at 27Gbps, I thought the take on Windows VM was very interesting.

As mentioned in the white paper, the Linux VMs performed better than the Windows VMs as they leveraged the Large Receive Offload (LRO) which is not available for Windows. This started to make me think about some of the issues that could be addressed just by having a simple understanding of what this means. A VM that does not support LRO, its receiving packets are processed by the vCPU that it has been assigned. One important thing to note is that by default, only vCPU 0 will be used by your Windows VM. This means that even if you have a VM that has been assigned 8 vCPUs, when it comes to processing received packets, that task will only be handled by vCPU 0 as the other 7 sit back and relax. Basically your VM will still wait for it to schedule all the vCPUs before it does anything, however, when all the vCPUs have been engaged, only one will do the job.

As mentioned in the white paper as well, what you can do is enable Receive-Side Scaling (RSS) and this enables the windows VM to utilize all its assigned vCPUs when processing received packets. Your VM will wait to schedule all the vCPUs assigned, why not make use of all of them while you have ’em. This can enhance your VMs performance. Not to mention multiple vCPU should only be assigned to a VM if the application supports it and assigning multiple vCPUs will enhance the VMs performance. In a highly taxed host, a VM with multiple vCPUs for no reason will only suffer.

In a non RSS enabled windows VM where you see a spike in processor due to network utilization, you will notice adding another vCPU doesn’t solve your issue. What might happen is that if your single vCPU VM was at a 100% CPU utilization, now it will be at 50%. If you increase the vCPUs to 4, now the utilization will only be about 25%. But the performance is still the same. Whats going on? Only 1 vCPU is doing all the processing for received packets. ESXTOP will solve the mystry for you as well. By enabling RSS on this VM, you can benefit from using all the vCPUs assigned. Again, be sure that assigning more vCPUs is not causing scheduling issues in your environment. That will depend on how busy your host is.

You can find out if RSS is enabled by running netsh int tcp show global in the command line and it will show you the status.

rss enabled

Its enabled by default on Windows 2008 R2 and can be enabled on windows 2003 sp2 and up. You will also have to enable RSS in the driver setting for the VMXNET3 inside your VM and you are all set. You will need to use VMXNET3 to enable RSS, VMXNET2 will not cut it. Simple things like this can certainly assist in optimizing your environment and put you at ease with what lives in your cloud.

vCD + vCO +AD What does this mean?

We are all aware of the VMware vCenter Orchestrator (vCO) and how it enables the automation of so many tasks in vCenter that could become tedious and boring after a while. In February 2011, VMware released the vCO plugin for vCloud Director aka vCloud aka vCD. How will this help you? Think about the process involved in creating an organization, virtual datacenter, users, networks etc. With vCO, you can create workflows to automate these tasks so that the ever shrinking and often under staffed IT department can focus on bigger and better things. vCO for vCD expanded the automation to your vCD environments which is slowly becoming the standard to start your own cloud.

Today, VMware has released vCO plugin for Active Directory. Some of you must be thinking whats the big deal? With this plug-in, you can enhance the level of automation you had before and provide better service to your customers. Tasks like creating AD accounts, deleting AD account, groups, memberships can all be automated by leveraging vCO workflows. It comes with 32 preconfigured workflows and you can create more as needed.  To benefit from this, you will need  to have a vCO setup along with your vCD. Not to mention you will need to have the underlying vSphere environment in  place.

Lets think of what this could do when all the pieces line up. The customer logs into your portal and creates a request. vCO creates the organization/virtual datacenter/network/ bla bla bla for the user. The user then creates a vApp that consists of two VMs in the vDC. Now with the AD plugin for vCO, the workflow for creating computer accounts for these VMs can be leveraged. Oh and did I mention the new organization that was created will now have the users that were also created in AD using a workflow. Next thing you know a customer decides  to leave, now a workflow can handle that task as well. You dont have to go and delete the child objects in the organization like you did before. Your workflow will delete all the child objects and finally your organization as well. Now your resources are available to be purchased again. And while you are at it, you can also clean up AD with the accounts that are no longer needed using the AD plugin for vCO.

To sum it up, automation is the name of the game. You dont have to be a hardcore developer to leverage vCO. If I can do it, anyone can. The only way to handle the ever increasing demand is by automating tasks that can be automated. It even decreases the room for error and helps to divert your time into bigger and better things besides creating a VM, adding a portgroup etc. Lastly, automation is also extremely important in ensuring your cloud meets the standard definition of what a cloud should be. Trust me, there are just too many definitions out there, and automation seems to be the common denominator.

You can download the AD plugin for vCO here.

vSphere client for iPad (Review)

I was too excited about getting the iPad2 this year and one of the first things I started looking for was the vSphere client that VMware was supposed to make for the iPad. After standing in line and with the help of my friend, I was finally able to get my hands on Apple’s new tablet. For the next two days I religiously searched for the vSphere client for the iPad but was disappointed not to find it. Just this past Sunday, I was talking to a friend who asked me if I tried out the iPad app for vSphere. So I started searching again and it turns out I gave up searching 3-4 days before it was finally released (March 17th, 2011). After feeling left out, I finally downloaded it and took it for a spin.

You will need to download the vCMA, vSphere Client for iPad and off course a vSphere environment and an iPad will be needed. Once you have fired up your vCMA, be sure to change your password for the vCMA appliance. This is not a requirement, but if you plan on allowing remote access to your vCMA appliance, you may not want to leave it with the default password that is known by the masses. You can manage your vCMA appliance at, http://YourIP:5480. I would also assign the vCMA a static IP.

Once you have assigned the IP to vCMA, go to the settings in your iPad and tap on the “vSphere Client” and enter the IP of your vCMA in the “Web Server” field.  Read the rest of this entry »

Duplicate MACs in vCenter

I don’t have a lot of experience with Hyper-V, but I have worked with people who have. After hearing their horror stories, I don’t envy acquiring that sort of experience. Speaking of horror stories, my favorite one is, when one of my co-workers told me about a Hyper-V environment they had setup which was generating duplicate MAC addresses. I was amused but in the back of my mind I started thinking if this was possible in a vSphere setup. Yes it is.


vCenter assigns MAC addresses using the unique ID that’s assigned to it under AdministrationvCenter Server Settings > Runtime Settings

This unique ID can be set to a value between 0-63. If you have two or more vCenter’s running the same instance ID, its only a matter of time before you start seeing mac conflicts in your environments.

vCenter assigns MAC addresses using a simple formula (00:50:56:80HEX+UID:00:00). So if your vCenter ID is 45, your VMs mac should be 00:50:56:ad:XX;XX. As you can tell the fourth byte is what can help you identify which vCenter was used to create the VM. The fifth and sixth bytes are the ones that are usually edited if you have a need for assigning a MAC instead of vCenter doing it for you.

Another interesting thing I noticed, I saw 3 different types of MAC addresses in my VMs. I saw,

VM1 00:50:56:ad:c2:3F

VM2 00:0C:29:73:B1:2F

VM3 00:5056:a5:d2:6F

It turns out, that VM1 was created on my new vCenter with a UID 45 (ad=80HEX + 45), VM2 was created directly on a ESXi host and VM3 was created on a different vCenter with a UID of 37(a5=80HEX + 37).

Though these little things don’t matter as much, but its important to know how all this comes together. It will be very helpful when you find yourself in a situation where your VMs have identical MACs. Someone forgot to set a unique ID for their vCenter.


MSCS VMs and Snapshots

When using VMs in a MSCS cluster across box (CAB), you will need to setup the RDMs in a physical compatibility mode and enable bus sharing. Please note that VMs with RDMs in physical mode will not allow you to snapshot either. Basically, you will find that your MSCS VMs will have their snapshot option greyed out. What does this mean?

You can’t use VCB to backup your VMs as that relies on snapshots.
You can’t use vDR for backups as that relies on snapshots.
And lastly you can’t leverage the snapshotting ability for tasks like patching of your VM if you have been practicing that in the past.

When running the disk in independent/persistent mode, you would think that snapshots would still work and only snapshot the vmdk running the OS partition and not your RDM. However, with the bus sharing in place for MSCS to work, the snapshotting criterion is not met. Hence the option remains greyed out. Also, lets assume you are able to snapshot the VM somehow, and you take the snapshot of VM1, after keeping it for a day you suddenly decide to revert it back, I am not sure on how VM2 or even the cluster itself will behave when all of a sudden one of its nodes has forgotten what happened 24 hours ago. So, just don’t snapshot it even if you come up with a way of being able to do it.

One thing I haven’t tried yet is to see what happens when I turn a node off, snapshot it, do what I need to do and when I need to revert, turn it off and then revert. Not sure if this is even possible or supported. Not to mention, the bus sharing will be have to be disabled. But in all seriousness, I would never do this in a production environment. If I ever decide to test this, this will only be for my own curiosity.

Not being able to snapshot should not really be the reason for you to scrap MSCS in VMware. This is simply a limitation to understand prior to designing a solution. If snapshotting is of paramount importance, then this may not be your cup of tea, however will the alternate solution give you that ability? If not being able to back up the VM using VCB or vDR is your issue, then please be informed that you can still backup the VM by installing the agent inside the guest and use a traditional backup mechanism. Leverage the same infrastructure that has been backing up your physical world. MSCS in VMware is not perfect, but it will get there.

MSCS and vSphere Conflicts

As already addressed in the vSphere 4 u1 release notes, MSCS VMs are supported in a HA/DRS cluster, its amazing how many few have noticed the change. With all the functionalities that have been introduced over the years by VMware, its easy to miss a few things every now an then. Some consider MSCS a primitive form of clustering as opposed to HA/DRS clusters within ESX/i. However it must be noted that a HA/DRS cluster does not protect you from application failure or OS corruption. Neither does FT in vSphere. With a FT enabled VMs, it must be noted that when the primary VM blue screens, so does the secondary VM and you are left with two identical server both not functioning.

To sum it up, HA/DRS and even FT protects you from a hardware failure only. According to VMware, MSCS must be leveraged to maintain a 100% uptime for Windows guests. So what you can and cannot do with MSCS and VMware?

You can cluster two VMs on the same host, two VMs on seperates hosts and you can also cluster a physical and virtual machine. There are detailed guides published by VMware on how this can be achieved. (Click Here)

A 50K foot view of what you can and cannot do and this will also differ based on the version of ESX/I you are running:
Only two nodes in a MSCS cluster
MSCS cannot be an FT enabled VM
Though MSCS VMs can be in a HA/DRS cluster, both HA and DRS should be disabled for all the VMs that are a part of MSCS
Quorum and shared disk should not have the VMFS signature and should be presented to all the hosts in the cluster where the MSCS VMs reside (Think about it, it makes sense)
Don’t overcommit and try to create a reservation for your VM equal to the size of the memory assigned.
The VMware doc will have more details

Now the last part, DRS is disabled because under the hood, HA uses vMotion. Though vMotion is rapid and causes no outage for the users, MSCS heartbeat is very sensitive and may detect the few seconds of the stunning period as a node failure and consider that node to be down. This is certainly not what you want. Hence its best not to vMotion, which is why DRS is disabled as well.
Why is HA disabled? No one has been able to give a straight answer on that and it basically comes down to that its not supported.

As of now I really don’t know why you can’t have HA enabled for a VM that is part of a MSCS cluster.
The good news is, with 4 u1 and onwards, you can utilize the same hosts that are in a HA/DRS cluster to run your MSCS VMs, just don’t forget to disable these features for the VMs that are part of the MSCS cluster or else the VMware and MS support may stiff you in time of need.