Optimize performance of Win VMs using RSS in vSphere

Recently VMware published a new white paper about network performance for vSphere 4.1. Duncan posted a link to it on his blog and I decided to take a look at what it had to offer. Without a doubt, it had some very useful information and most importantly its an easy read. So I recommend you read the white paper as well. Along with the possibility of a Linux VM receiving packets at 27Gbps, I thought the take on Windows VM was very interesting.

As mentioned in the white paper, the Linux VMs performed better than the Windows VMs as they leveraged the Large Receive Offload (LRO) which is not available for Windows. This started to make me think about some of the issues that could be addressed just by having a simple understanding of what this means. A VM that does not support LRO, its receiving packets are processed by the vCPU that it has been assigned. One important thing to note is that by default, only vCPU 0 will be used by your Windows VM. This means that even if you have a VM that has been assigned 8 vCPUs, when it comes to processing received packets, that task will only be handled by vCPU 0 as the other 7 sit back and relax. Basically your VM will still wait for it to schedule all the vCPUs before it does anything, however, when all the vCPUs have been engaged, only one will do the job.

As mentioned in the white paper as well, what you can do is enable Receive-Side Scaling (RSS) and this enables the windows VM to utilize all its assigned vCPUs when processing received packets. Your VM will wait to schedule all the vCPUs assigned, why not make use of all of them while you have ‘em. This can enhance your VMs performance. Not to mention multiple vCPU should only be assigned to a VM if the application supports it and assigning multiple vCPUs will enhance the VMs performance. In a highly taxed host, a VM with multiple vCPUs for no reason will only suffer.

In a non RSS enabled windows VM where you see a spike in processor due to network utilization, you will notice adding another vCPU doesn’t solve your issue. What might happen is that if your single vCPU VM was at a 100% CPU utilization, now it will be at 50%. If you increase the vCPUs to 4, now the utilization will only be about 25%. But the performance is still the same. Whats going on? Only 1 vCPU is doing all the processing for received packets. ESXTOP will solve the mystry for you as well. By enabling RSS on this VM, you can benefit from using all the vCPUs assigned. Again, be sure that assigning more vCPUs is not causing scheduling issues in your environment. That will depend on how busy your host is.

You can find out if RSS is enabled by running netsh int tcp show global in the command line and it will show you the status.

rss enabled

Its enabled by default on Windows 2008 R2 and can be enabled on windows 2003 sp2 and up. You will also have to enable RSS in the driver setting for the VMXNET3 inside your VM and you are all set. You will need to use VMXNET3 to enable RSS, VMXNET2 will not cut it. Simple things like this can certainly assist in optimizing your environment and put you at ease with what lives in your cloud.

e1000 packet loss

While going through the network stats in esxtop, I noticed huge percentage of recieved packets being dropped. For a second I thought I am not looking at the right screen but I was. The numbers were all over the place and didnt make any sense at all. They jumped from 20%, to 90% to 3%, 48% all with 5-6 sec. Sadly this was having to almost all the VMs. I didnt know if this was really happening or the host didnt really know what was going on as we are using the 1000v.

 

I asked our network team to use their tools to see if they are noticing the same thing on their end. They are still researching the issue. In order to restore some sanity, I decided to increase the receive buffer of a VM to see if that would make any difference. It didn’t! Finally I decided to start looking at a few VMs that weren’t experiencing this issue. To my surprise they were all XP machines. I started to think it is guest related in some way. Upon further investigation, I noticed the XP VMs were running the flexible NICs vs the e1000 that was running on the other 2003/2008 VMs. My next step was to replace the NIC on one of the VMs with the vmxnet NIC to see what happenes. WOLLLLAA! the drop packets went down the 0 and stayed there.

I am waiting on the network team to confirm that the e1000 was in fact loosing actual packets and the esxtop wasn’t simply nutty. Once they do confirm this, I plan on replacing the NICs on the other VMs as well. The vmxnet will give you better throughput, they are less CPU intensive and lastly according to ESXTOP they aren’t loosing packets like the e1000 were. The e1000 were great until the vmxnet came around. I think its about time we start looking into implementing this. One thing I don’t like is that the vmxnet NICs appear as removable objects in your system try like a USB drive would . I am sure there must be a way to fix it, I just haven’t figured it out yet.

EDIT: 11/24

VMware released a patch in Sept to address this issue. It turns out that the packet loss being reported on ESXTOP for e1000 nics was not correct.