The 5.1s are out in the cloud

It would be foolish for me to even think that I will be among the first ones to be bringing this info to you. But like I always say, my blog is also my space to save bookmarks and important info.

All the 5.1s that have been the talk of many vGeeks is finally out. I thought, about compiling a list of links together, but my man Duncan has already have me beat, so I will just borrow his list. He says he didn’t set an alarm but I don’t buy that. Enjoy the new products below, hopefully I will have sometime to blog about some of the new features that are being introduced. This is major update, I would suggest going through the enhancement documentation below that go over the new features. Most new releases go through a vigorous beta, but I will recommend one runs them in a lab first. The vCloud Suite 5.1 is now GA.

Download links:

What’s new docs:

vSphere + SQL 2012 + AlwaysON

Over the weekend, one of my ex-colleagues reached out to me for my input on running SQL in a 2 node Microsoft Cluster. Of course, these servers were supposed to be VMs and all that. Now my initial reaction was, yeah sure you can do all this but there are restrictions that are placed on VMs that are part of MSCS. This included, affinity rules, exceptions in DRS, limited to two nodes only and all the other stuff that goes along with it.

Below are some of the limitations or stuff that you CANNOT do on VMs that are part of the MSCS. This is straight from the install guide:

  • Mixed environments, such as configurations where one cluster node is running a different version of ESXi than another cluster node.
  • Use of MSCS in conjunction with vSphere Fault Tolerance (FT).
  • Migration with vSphere vMotion of clustered virtual machines.
  • N-Port ID Virtualization (NPIV)

As I was going over the caveats, I was reminded that this will be running SQL 2012 using AlwaysON Availability Groups, and the VMs need to be in a cluster however there will bo no need for shared storage, even the quorum can be windows share somewhere on a fileserver etc. When I heard that, I was confused and asked for some time to do research. I read a few articles online, talked to folks on twitter, posted input on the secret vExpert community (which btw was very helpful and prompt) and came back with a totally different mindset to approach this.

Based on the feedback I got from everyone, from the articles I read and of course the AlwaysON intro my ex-colleague gave me I came to the following conclusion. VMs running SQL 2012 AAG in a Microsoft Cluster with no shared disk should be treated like any other VM. Which means, you can have more than 2 nodes and the limitations of  a typical MSCS VM does not apply. Why so?

To start off, what in the world is AAG? You can get some good detailed info on that here. But to put it simply, AAG is the new and improved HA and DR solution for SQL, kinda like database mirroring (the orignal database mirroring still exists but its probably on it’s way out).

It’s like mirroring, but we get multiple mirrors for many more databases that we can fail over in groups, and we can shed load by querying the mirrors.

It relies on Windows Failover Clustering and the file synchronization happens at the application layer. This means that all those contraints that are usually placed on MSCS VMs due to RDMs, dont exist here. This also means, you can vMotion the box and do everything that you would to most other VMs. To put it simply, it’s not a special VM anymore.

How is AAG better than database mirroring? Below is a list of some of the improvements extracted from this article:

  • Supports one primary database replica and allows up to 4 secondary database replicas targets.
  • Asynchronous-commit mode. This availability mode is a DR solution that works great when the availability replica copies are distributed with not so stable connection.
  • Synchronous-commit mode. This availability mode put emphasis on high availability and allows data protection over performance, con is transaction latency.
  • Allows automatic page restoration against page corruption.
  • Backupable and Read-only access to the secondary databases
  • Fast application failover is provided by availability group listeners.
  • The greater failover control is achieved by Flexible failover policy.

Of course there are a few things to consider when you run this all virtually (thanks to the VMTN feedback I received), even though you may be able to vMotion these boxes, keep in mind the Windows Failover Cluster (WFC) heartbeat is very sensitive and and the small stun time maybe enough for your cluster to assume a node has failed. So adjusting your heartbeat timeout maybe something to consider. Matt shows here how to do that here, though he is doing that on a Database Availability Group (DAG), it still relies on WFC like AAG.

Now how do you set this up? I was thinking about doing a step by step but I found someone else who already beat me to that. So here are the steps that cover setting up WFC to enabling AAG on SQL. Denny Cherry also plans to have a session around this topic at VMworld.

In the end, I think SQL 2012 with AAG will certainly help to better the relationship of SQL and virtualization. With the restrictions relaxed on this type of a setup, you can now have bigger WFC clusters within a HA/DRS cluster. With HA/DRS you get the protection from hardware related incidents and with AAG,  your application becomes intelligent. In the end you look good and find more time to do more important things in life. :)

PS : Lastly, we will still be doing a proof of concept to see how well this all holds up. I will encourage you to do your independent testing before introducing this in production. In paper this sounds perfect. I plan to keep this post updated with what we find / learn or at least a link to the updated post depending on how this goes. Good luck!

Free vSphere lab give away, with gears and vouchers

Wanted to get on the vSphere bandwagon but can’t figure out how to get your lab setup so that you can take it for a spin? Well, Cody Bunch over at ProfessionalVMware has made your wish come true. It’s a free lab give away which virtually has everything that you will need to get your lab going and leave you in a place where you would feel confident in stepping to the other side.

The lab give away includes, awesome books, video training, VMware Workstation, Exam voucher, 365 day eval license for vSphere plus a bunch of awesome gear and much more.

So what do you have to do? Simple:

1-3 minute video of explaining who you are and why do you think the lab will help you. All the details are here.

I think, this is an awesome opportunity to not only get your feet wet but jump right into the virtualization ocean. Hurry up, this ends on 12/13 (Midnight) and the winner will be announced shortly after (Dec 14th and no later than Dec 16th).

Good Luck!!

vCartoon of the Week (07/18/2011)

Another great idea that isn’t really mine came from old friend. So basically every Monday (I will try my best) I plan to post a cartoon in the “vCartoon” section of my blog. The vCartoons will cover the virtual world in a comical way to lighten our already long Mondays. I have never been the artistic kind so I had to reach out to an old friend of mine. The vCartoons are produced by M. Ali.

As of now I haven’t figured out how to create a link to a category. As soon as I do, you will see a link to simply view all the vCartoons on the blog. If you have a vCartoon idea please send it to me I will request Ali to make it happen. In light of the licensing fiasco, enjoy the first vCartoon.

virtual-world

HA and Admission Control

I have seen admission control being used without really understanding how it impacts your cluster and your available resources. While configuring admission control on a cluster the other day, I started thinking how this really works. The concept is pretty simple. According to VMware:

Slot size is comprised of two components, CPU and memory. VMware HA calculates these values.

The CPU component by obtaining the CPU reservation of each powered-on virtual machine and selectingthe largest value. If you have not specified a CPU reservation for a virtual machine, it is assigned a defaultvalue of 256 MHz (this value can be changed using the das.vmCpuMinMHz advanced attribute.)

The memory component by obtaining the memory reservation (plus memory overhead) of each poweredon virtual machine and selecting the largest value

HA relies on slot sizes and in the current version of ESX/i, if no reservations are used, the default slot sizes are 256 MHz and the memory overhead. Now keep in mind, if you happen to have a VM which has a reservation of 4GB, now all of a sudden your slot size has become 256 MHz and 4GB in memory. Basically now you have less slots to place your VMs and admission control will make it to where you can’t power on more VMs than what can be accommodated according to your host failures cluster tolerates setting. Basically HA will look for your worst case CPU and memory reservation to come up with the slot size. All that I just mentioned should be common knowledge.

Let’s assume you have a cluster of 3 hosts and VMs with no reservation, HA is turned on, host failures cluster tolerates is 1, admission control is enabled and your isolation response is set to shutdown. For simplifying things lets assume your cluster is balanced where each hosts has 10GHz CPU and 24GB of memory. Your cluster has a total of 30GHz CPU and 72GB of memory. The total number of VMs running is 60 and none of them have any reservation. Lets also assume your slot size is 256 MHz and 300MB (overhead). So how many slots do you have? You have 30000/256 = 117 in CPU and 72000/300 = 240 in memory. You always pick the lowest number and according to what we calculated above, you have 117 slots available on this cluster.

Let’s assume a host fails and now we only have 20GHz and 48GB left in our cluster. We now have 20000/256 = 78 and 48000/300= 160, which means we have only 78 slots available now. So you have 78 slots and 60 VMs (1 VM/slot), should all your VMs power on? No, because your cluster still has Host Failures Cluster Tolerates set to 1 and admission control is enabled. It’s important to understand how admission control really works. According to VMware:

With the Host Failures Cluster Tolerates policy, VMware HA performs admission control in the following way:

1 Calculates the slot size.A slot is a logical representation of the memory and CPU resources that satisfy the requirements for any powered-on virtual machine in the cluster.

2 Determines how many slots each host in the cluster can hold.

3 Determines the Current Failover Capacity of the cluster.This is the number of hosts that can fail and still leave enough slots to satisfy all of the powered-on virtual machines.

4 Determines whether the Current Failover Capacity is less than the Configured Failover Capacity (provided by the user).If it is, admission control disallows the operation.

So according to that, even though your cluster has enough slots to run all your VMs, but because your host failures cluster tolerates is set to 1, admission control has to make sure it only runs the load it can afford to run in case of another host failure. Basically admission control knows there are 78 slots available but it has to keep in mind that in case of another host failure it will only have 39. Because host failures cluster tolerates is set to 1, admission control will only allow 39 slots to be accommodated. So once HA realizes that 39 slots have been taken, it will not allow anymore power on. It’s saving you from yourself.

I will not throw in other complications like memory reservations or an unbalanced cluster (hosts with different resources) and how to handle that yet just to keep it simple. I do plan to post about why reservation would be a bad idea at the VM level and ways to get around the conservative slot sizes. HA and admission control are awesome tools to have, but if you don’t plan intelligently, you will soon begin to hate them.

Memory state in ESXTOP/RESXTOP

Often times, you will question if you have enough room for another VM on your host. Now before I begin, let me clarify in a larger environment, you should certainly use capacity analysis tools. But what if you are a small shop and can’t afford one of those tools and you are only an owner of a small cluster and dont mind running ESXTOP/RESXTOP to figure this out. You can look at TPS and other areas but the memory state of the host will indicate the kind of the stress this host is under. This will be your best friend.

MemoryState

As you can tell my host is in the ‘High” state. What does this really mean? Your host can be in one of the following states: “high”, “soft”, “hard” or “low”. Your host will be in either one of these states based on the following:

high state = if the free memory is greater than or equal to 6%

soft state = if the free memory is at 4%

hard state = if the free memory is at 2%

low state = if the free memory is at 1%

As you can tell, high state is what will keep your host happy. One thing to note is in the high and soft states, ballooning is favored over swapping, in hard and low states, swapping is favored over ballooning. Of course TPS and other techniques will enable you to efficiently use the memory on your host and allow you to overcommit.  Another thing to point out is that your host maybe in ‘high’ state but you may notice your VM is still swapping. It’s not  the host, its really the limit on your VM or your RP settings that is causing this VM to swap.

The good news is that DRS will move your VM over to another host (based on your setting) if its gets under stress and moving a VM will guarantee to better its performance. But I have always found ESXTOP/RESXTOP to be an excellent tool to get an insight on whats really happening on your host. Remember a holistic view is great, and when we talk about a cloud a single host may not mean much. However, each host is a building block that forms your cloud. Understanding how memory is handled on a host level will give you better insight on the holistic stats of memory in your cloud.