Optimize performance of Win VMs using RSS in vSphere

Recently VMware published a new white paper about network performance for vSphere 4.1. Duncan posted a link to it on his blog and I decided to take a look at what it had to offer. Without a doubt, it had some very useful information and most importantly its an easy read. So I recommend you read the white paper as well. Along with the possibility of a Linux VM receiving packets at 27Gbps, I thought the take on Windows VM was very interesting.

As mentioned in the white paper, the Linux VMs performed better than the Windows VMs as they leveraged the Large Receive Offload (LRO) which is not available for Windows. This started to make me think about some of the issues that could be addressed just by having a simple understanding of what this means. A VM that does not support LRO, its receiving packets are processed by the vCPU that it has been assigned. One important thing to note is that by default, only vCPU 0 will be used by your Windows VM. This means that even if you have a VM that has been assigned 8 vCPUs, when it comes to processing received packets, that task will only be handled by vCPU 0 as the other 7 sit back and relax. Basically your VM will still wait for it to schedule all the vCPUs before it does anything, however, when all the vCPUs have been engaged, only one will do the job.

As mentioned in the white paper as well, what you can do is enable Receive-Side Scaling (RSS) and this enables the windows VM to utilize all its assigned vCPUs when processing received packets. Your VM will wait to schedule all the vCPUs assigned, why not make use of all of them while you have ‘em. This can enhance your VMs performance. Not to mention multiple vCPU should only be assigned to a VM if the application supports it and assigning multiple vCPUs will enhance the VMs performance. In a highly taxed host, a VM with multiple vCPUs for no reason will only suffer.

In a non RSS enabled windows VM where you see a spike in processor due to network utilization, you will notice adding another vCPU doesn’t solve your issue. What might happen is that if your single vCPU VM was at a 100% CPU utilization, now it will be at 50%. If you increase the vCPUs to 4, now the utilization will only be about 25%. But the performance is still the same. Whats going on? Only 1 vCPU is doing all the processing for received packets. ESXTOP will solve the mystry for you as well. By enabling RSS on this VM, you can benefit from using all the vCPUs assigned. Again, be sure that assigning more vCPUs is not causing scheduling issues in your environment. That will depend on how busy your host is.

You can find out if RSS is enabled by running netsh int tcp show global in the command line and it will show you the status.

rss enabled

Its enabled by default on Windows 2008 R2 and can be enabled on windows 2003 sp2 and up. You will also have to enable RSS in the driver setting for the VMXNET3 inside your VM and you are all set. You will need to use VMXNET3 to enable RSS, VMXNET2 will not cut it. Simple things like this can certainly assist in optimizing your environment and put you at ease with what lives in your cloud.

PSOD (Esxi 4.1) Dell R710

The PSOD that has been talked about for years now finally made an entry to our datacenter this morning. The R710 that was spinning for a couple of days and still not part of anything special except for the plain ESXi installed on it stopped responding to my ping requests. Oblivious to what had happened, I called our network gurus to see if anything on their end was changed. Once it was confirmed that the network wasn’t tweaked, I decided to log on to the slow KVM to get to the console of this server. It got really silent and right before me was the PSOD on ESXi 4.1. I have been working with VMware for sometime now and I knew how big this really was. So I took a snapshot as a souvenir and hoped this would get me to the elite class of VMware engineers that I have so far only envied for.

From the PSOD it only made sense to look at the hardware side first. Upon running diagnostics, errors with PCIE came up. I figured it would be best to call dell and not reinvent the wheel. Sure enough once I was on the phone with dell, and upon sending the system logs from the DRAC, it was revealed that PCIE card on slot 2 (quad port NIC) was the culprit. We placed the card on slot 3 and now slot 3 started reporting issues. Luckily for us we had a few spare NIC cards that we ended up placing in this box. It has been up for hours now. No error logs reported yet, no orange LCD on the server and no purple screen of death.