Recently VMware published a new white paper about network performance for vSphere 4.1. Duncan posted a link to it on his blog and I decided to take a look at what it had to offer. Without a doubt, it had some very useful information and most importantly its an easy read. So I recommend you read the white paper as well. Along with the possibility of a Linux VM receiving packets at 27Gbps, I thought the take on Windows VM was very interesting.
As mentioned in the white paper, the Linux VMs performed better than the Windows VMs as they leveraged the Large Receive Offload (LRO) which is not available for Windows. This started to make me think about some of the issues that could be addressed just by having a simple understanding of what this means. A VM that does not support LRO, its receiving packets are processed by the vCPU that it has been assigned. One important thing to note is that by default, only vCPU 0 will be used by your Windows VM. This means that even if you have a VM that has been assigned 8 vCPUs, when it comes to processing received packets, that task will only be handled by vCPU 0 as the other 7 sit back and relax. Basically your VM will still wait for it to schedule all the vCPUs before it does anything, however, when all the vCPUs have been engaged, only one will do the job.
As mentioned in the white paper as well, what you can do is enable Receive-Side Scaling (RSS) and this enables the windows VM to utilize all its assigned vCPUs when processing received packets. Your VM will wait to schedule all the vCPUs assigned, why not make use of all of them while you have ‘em. This can enhance your VMs performance. Not to mention multiple vCPU should only be assigned to a VM if the application supports it and assigning multiple vCPUs will enhance the VMs performance. In a highly taxed host, a VM with multiple vCPUs for no reason will only suffer.
In a non RSS enabled windows VM where you see a spike in processor due to network utilization, you will notice adding another vCPU doesn’t solve your issue. What might happen is that if your single vCPU VM was at a 100% CPU utilization, now it will be at 50%. If you increase the vCPUs to 4, now the utilization will only be about 25%. But the performance is still the same. Whats going on? Only 1 vCPU is doing all the processing for received packets. ESXTOP will solve the mystry for you as well. By enabling RSS on this VM, you can benefit from using all the vCPUs assigned. Again, be sure that assigning more vCPUs is not causing scheduling issues in your environment. That will depend on how busy your host is.
You can find out if RSS is enabled by running netsh int tcp show global in the command line and it will show you the status.
Its enabled by default on Windows 2008 R2 and can be enabled on windows 2003 sp2 and up. You will also have to enable RSS in the driver setting for the VMXNET3 inside your VM and you are all set. You will need to use VMXNET3 to enable RSS, VMXNET2 will not cut it. Simple things like this can certainly assist in optimizing your environment and put you at ease with what lives in your cloud.





