VMware's secret projects for client side computing

The other day I had an opportunity to speak with someone from VMware. We were discussing a variety of things and something came up that I thought was worth sharing. As we all know ThinApp gives us the ability to virtualize applications. What it basically does is makes your applications self contained where they could be streamed and used by users of all kinds running different operating system. In other words, it gives you the ability to run IE6,7,8, 9 all on one machine. The application isn’t really installed on the client side, but its simply being accessed and streamed to the user where the OS the user runs is irrelevant.

So what’s the future of ThinApp? We were discussing how Project Octopus will add data syncing to the picture. It will be sort of like dropbox but more of an enterprise class solution where files are shared across users anywhere anytime in a secure manner, via public/private clouds. Now because this will be a VMware product, we can assume it’s tight integration with View or other end user related products like Horizon for instance. I think this will certainly be a good addition.

So what about ThinApp? In comes project AppBlast. We already know ThinApp makes the OS irrelevant where you can run all kinds of applications and make them available to your end users. The next logical step is what AppBlast will do. AppBlast gives you the ability to make these applications available to a browser or device that supports HTML5. Have you been waiting forever to justify a purchase of an iPAD? Well, I guess your enterprise applications forced you to carry your laptop everywhere. However, AppBlast promises that you will be able to access all your applications as long as you can find a browser or device that supports HTML5.

I am unsure about the release dates of either one of the products, but I am confident that both these products will soon hit the market. The stage has already been set for these types of solutions to introduce themselves. This will certainly be an excellent addition to the end user experience and client side computing. Now the question is, will I really need a View client on my iPad when my iPad will give me access to all the applications that I will need? ThinApp made the OS irrelevant, and it seems like AppBlast will make the device irrelevant.  I guess we will have to wait and see.

vCenter Operations Management Suite 5.0 overview

VMware announced the release of vCenter Operations Management Suite 5.0 today. With the release of vCOM, Capacity IQ has been put to sleep as all of Capacity IQ functionality has been fully integrated into vCOM. So what does this tool offer?

Apart from the features that CapIQ offered, vCOM will provide you more insight on whats happening in your virtual environment performance wise, alert you for any risk or current issues and forecast any potential issues that could come up down the road. Not to mention it will also tell you how to more efficiently use your environment. Sounds insane doesn’t it?

Let’s look at the screenshot below. What we have here is a dashboard that gives you the view of your world. Now what is your world? A world comprises of one or more vCenter along with it’s population. In the example below, we are looking at whats happening across two vCenters. The holistic view below shows you the hotspots under the health category which may require your attention along with a Risk widget that predicts issues in the future. The efficiency widget scans your environment and proposes ways on how to right size your VM based on how it has performed overtime. As you can tell, vCOM is boldly telling us that we can reclaim 308 vCPU, over 9TB of disk and 1 TB of vMemory, nice aye!! Instead of wasting more money in buying more resources for your environment, lets make full use of what we have. I am sure your boss will appreciate that as well and when it comes time to hand out bonuses, saving 9TB of disk alone will be hard to ignore 😀

Read the rest of this entry »

When to disable HA?

I was talking to a friend a few days ago who is the owner of a pretty big sized vSphere 4 environment. His network team is in the middle of upgrading switches which requires some outage (30 sec+) on his management network for the ESXi hosts.  He wanted to discuss the best possible way to approach this without disrupting the environment and causing an outage for the VMs. Keep in mind the VMs network will continue to run and is not expected to go down.

My immediate question was if HA is enabled and the isolation response that has been set on the clusters. In his case the isolation response is to shutdown. Obviously this means, as soon as the management network has the outage,

  • the hosts will be unreachable
  • the hosts will try to reach each other with no luck and
  • ultimately ones they realize they can’t even ping the gateway (isolation addr)
  • all hosts will declare themselves isolated

The result will be an environment that will automatically shutdown all the VMs., exactly opposite of what we want. So whats a simple workaround? Disable HA, and re-enable it ones the network maintenance is completed. Of course keep your fingers crossed that your hosts don’t experience a hardware failure during this time.

It’s interesting to explore if changing the isolation response to leave VMs powered on will be helpful in this case or not. If the isolation response is set to powered on and the management network experiences the outage:

  • the hosts will be unreachable
  • the hosts will try to reach each other with no luck and
  • ultimately ones they realize they can’t even ping the gateway (isolation addr)
  • all hosts will declare themselves isolated

At this time, though the VMs will continue to run, your primary hosts in the cluster will now try to turn on the VMs that are already running on the other hosts (assuming isolated primaries are hardworking hosts that continue to do their jobs even in an isolated state). Of course they wouldn’t be able to but this will be unnecessary stress on the hosts that you probably dont want. Would it impact your VMs? Maybe, remember we want the VMs to have no outage or impact. Seems like leaving VMs powered on and HA enabled for the network maintenance period is also not the best option as it could cause performance issues for some VMs. I still think disabling HA, during this outage might be the most seamless option at this time.

What about vSphere 5?

Naturally the next thing that came to mind after a couple of days was does this change with vSphere 5? It does. As we already know that in vSphere 5, we also have datastore heartbeats that correctly identify the state of the hosts in the cluster. For those who are not aware, vSphere 5 uses 2 datastores by default that are shared across all or most hosts in a cluster to use as heartbeat in case the management network experiences an outage. So what will happen when the mgmt network goes down in a vSphere 5 cluster and the isolation response is to leave VMs powered on?

  • Each host in the cluster will enter an election (except the master, the master will declare itself isolated in 5 sec)
  • As each host will be isolated, each host will elect itself as a master
  • Each host will ping the isolation address and declare itself isolated
  • Trigger the isolation response which is to leave VMs powered on

In this case within 30 sec of the management network outage, each host would have declared itself isolated and wont attempt to restart any VMs like the primaries would in vSphere 4. The result will be that no host will be under unnecessary stress to start VMs that are already running somewhere else unlike the previous ver. This also means the VMs will not experience any performance issues as the hosts will not have any additional stress. A file called poweron (on the shared datastore) that each host will own will also reflect their updated status as being isolated.

Now what happens when the network outage is over and the hosts are in a position to talk to each other? I have not been able to find documentation on weather an isolated host (host that has already declared itself isolated) will enter an election (vSphere 4 or 5) ones the communication channel is open and bring the cluster back to life. From what I have read so far, the host will remain in an isolated state unless some manual intervention is introduced like reseting HA etc. Perhaps Duncan or one of the HA experts can sanitize this and provide a concrete answer to this.

If re-setting HA is the quickest way to bring the hosts back to elect a master (vSphere 5) and primary hosts (vSphere 4), so that HA can function after the network outage, then perhaps disabling HA for the network outage period is probably a better option. Unless, they could autofix and enter an election ones the outage ends, ie. hosts that have declared themselves isolated return for an election. I have not been able to find any documentation on that. Based on that, disabling HA for the maintenance window would probably be the best option in this case (assuming the isolated hosts will not autofix themselves ones network comes back).

One thing to note, vSphere 5 offers a brand new HA under the hood. However in this scenario we noticed very little difference between the two ver. Keep in mind the architectural difference in HA from vSphere 4 to 5 enables the isolated hosts (in vSphere 5) to be correctly identified which eliminates the unnecessary stress some hosts would go through due to the leave powered on isolation response. There is more to it than what meets the eye.

EDIT:

Duncan wrote a post in reply to this one where he recommended unchecking “host monitoring” instead of disabling HA. This will be quicker and make it to where your cluster wont have to start over. Moreover, the host monitoring option is really meant for outages like this. So don’t disable HA, just uncheck “host monitoring” during the expected outage window.

vCenter service fails to start

Ever had that happen? The first instinct after a connection failure is to ping the vCenter box followed by seeing if the vCenter service is running. If the service is stopped, you would perhaps try to restart it (a good admin would probably check  the logs to see why it failed) and what if it fails to start? Next thing you know you are trying to verify that everything on the DB world is happy and ones that is also confirmed, you come back to the logs which is what should have been checking to begin with.

The vpxd logs would definitely come in handy this time, these are basically your vCenter logs. They are located on your vCenter server (vCenter 4.x) @ %ALLUSERSPROFILE%Application DataVMwareVMware VirtualCenterLogs (In Windows 2008, the vpxd.log file is located at C:ProgramDataVMwareVMware VirtualCenterLogs)

Based on what you find here would really help out in finding the root cause. I was seeing  the following in the log file:

[VpxdReverseProxy] Failed to create http proxy: An attempt was made to access a socket in a way forbidden by its access permissions.
[Vpxd::ServerApp::Init] Init failed: VpxdMoReverseProxy::Init()
Failed to intialize VMware VirtualCenter. Shutting down…
Forcing shutdown of VMware VirtualCenter now

Based on what I was seeing in my case, it could have been a bunch of things that could have been wrong according to this kb article. Obviously the thing that jumps out is that perhaps the vCenter ports are already being used somewhere. vCenter needs port 80, 443 and 902 at least unless these were changed to something else during install.

What ports are engaged:

One quick way to find out if these ports are already in use is by using the netstat command.

  • On you vCenter server go to run and type cmd
  • At the command prompt type “netstat -ao” and hit Enter. You should now see a big list of all engaged ports along with their PID
  • If you see any of the ports mentioned above (80, 443, 902) already being used while your vCenter is dead, its possible that is whats stopping vCenter from running
  • Note the PID of the port in question
  • Open Task Manager and click view –>Select columns
  • Check “PID (process identifier) ” and press ok.
  • Find the PID you noted above and that should point you in the direction of the application that has been putting your vCenter to sleep

Keep it simple:

In my case it was the infamous love between IIS and port 80, killing that fixed the deal. Why is IIS running on this server? Good question, I don’t know the answer. Ideally it should have no business there.

It’s important to note that by simply restarting the server its possible that your vCenter would have come back to life as long as the other application doesn’t engage that port first. However, you will find yourself in a vicious cycle of rebooting every time it goes down. Why do that when you can find the root cause and fix it. Keep your vCenter server clean and simple, it does a lot of complex task with simple clicks.

VMware converter – no source disk

It’s been a while since I had to P2V anything because in most places, that phase has already ended, but certianly we still have  a lot of envirnments where P2V is still happening. I was helping a friend P2V some of his servers in efforts to consolidate his DC etc and I ran across something interesting that I thought was worth sharing.

So I started the converter and provided my domain credentials (I was a local admin on the target server) and it came back with something interesting. It allowed me to deploy the client, however it also said that it didnt see any source disk “no source disk”. What this meant was that our plan to resize the disk partitions during convertion were kinda thrown out the window. I was pretty sure it was going to P2V (but I didnt try it so I can’t confirm), but that also meant that I would have to P2V and then clone the disk to a smaller sized vmdk etc, mess around with vmx and then attach the new vmdk to the VM. That’s silly, you shouldn’t have to do all that. So why was this happening?

The server we were trying to P2V was running Win 2008 R2, naturally I wanted to try a few other servers with the same OS and guess what they all behaved the same way “no source disk”. After digging around a little bit, I came to find out that in fact it was the infamous Win 2008 R2 security that would make this happen, yep you guess it – UAC. In Win 2008, Admin Approval Mode was disabled by default however, in 2008 R2 Admin Approval mode is enabled by default for all admins except the built in admin account. So how can you fix it? There are three options:

Option 1:

When providing windows credential during the convertion, use the built in local admin account. As Admin Approval Mode is disabled for this built in user, you should not see the “no source disk” issue if you use this account to deploy the agent on the target 2008 R2 server.

Option 2:

Deploy the agent locally on the target server, this allows the agent to me installed with less restrictions compared to the remote installation using a different local admin account as long as you right click on the agent and do “Run As Administrator”.

Option 3:

I like this one, but for no particular reason. Perhaps because I can still deploy the agent remotely and still use my domain account to move forward with the convertion. On the target server, go to the following location:

Local Security Policy —-> Local Policies —-> Security Options —->User Account Control: Run all administrators in Admin Approval Mode: Enabled

Disable the above policy and restart the target machine. Run the converter again and this time you should see the source disk on the target machine. I think enabling this policy after you are done would probably not be a bad idea, after all it saves you from yourself but also sort of gets in the way too at times.