Microsoft SQL Server High Availability Options on Nutanix

Microsoft SQL Server (MSSQL) supports several High Availability (HA) options at both the host and storage level.  For the purposes of this post I will only be addressing the HA options which leverage native Windows Server Failover Clustering (WSFC)  in some form.  SQL Server also provides transactional replication through the use of a publisher and subscriber model, which some consider an HA option, but that’s a topic (and debate) for another post.

Starting with MSSQL 2012, Microsoft introduced AlwaysOn which is a combination of some existing and new functionality.  Under the AlwaysOn umbrella falls two main options, Failover Cluster Instances (FCI) and Availability Groups (AAG).

Nutanix has long supported and recommended the use of AlwaysOn Availability Groups.  AAG leverages a combination of WSFC and native database level replication to create either an HA or disaster recovery solution between instances of MSSQL.  The instances of MSSQL leveraged to support the AAG can be either standalone or clustered (in the case of Nutanix these would be standalone instances today).   The following figure provides a logical overview of an AlwaysOn Availability Group.


An AAG performs replication at the database level creating “primary” and one or more “secondary” database copies.  The secondary copies are replicated using  either synchronous or asynchronous commit mode, as specified by an administrator.  Asynchronous commit is intended more as a disaster recovery or reporting solution as it implies the potential for data loss.  So for HA scenarios as we’re discussing them here, we should assume synchronous commit.  Because database replication is used, shared storage is not required and each MSSQL instance within the AAG can use its own local devices.  Additional details on AlwaysOn Availability Groups can be found here:

AAGs can take advantage of the secondary databases for the purpose of read-only transactions or for backup operations.  In the context of a scale-out architecture like Nutanix, leveraging multiple copies across hypervisor hosts for distributing these kinds of operations creates an excellent solution.

While AAGs are a great solution and fit nicely with the Nutanix architecture, they may not be a good fit or even possible for certain environments.  Some of the limiting factors for adopting AAGs can include:

  • Space utilization:  Because a secondary database copy is created additional storage space will be consumed.  Some administrators may prefer a single database copy where server HA is the primary use case.
  • Synchronous commit performance:  The synchronous replication of transactions (Insert/Update/Delete…) needed for AAG replication (in the context of an HA solution) do have a performance overhead.  Administrators of latency sensitive applications may prefer not to have the additional response time of waiting for transactions to be committed to multiple SQL instances.
  • Distributed Transactions:  Some applications perform distributed transactions across databases and MSSQL instances.  Microsoft does not support the use of distributed transactions with AAGs, and by extension application vendors will not support their software which utilize distributed transactions where AAGs are present.
  • SQL Server versions:  Some environments can simply not yet upgrade to SQL 2012 or higher.  Whether it be due to current business requirements or application requirements based on qualification, many administrators have to stick with SQL 2008 (and I hope not, but maybe even earlier versions) for the time being.

In the above cases MSSQL Failover Cluster Instances are likely the better solution.  FCI have long been used as the primary means for HA with MSSQL.  FCI can be leveraged with all current versions of MSSQL and relies on shared storage to support the MSSQL instances.  The following figure provides a logical overview of Failover Cluster Instances.


The shared storage used can be block (LUN) based or, starting with MSSQL 2012, SMB (file) based.  In the case of LUN based shared storage, SCSI-3 persistent reservations are used to arbitrate ownership of the shared disk resources between nodes.  The MSSQL instance utilizing specific LUNs is made dependent against those disk resources.  Additional details on AlwaysOn Failover Cluster Instances can be found here:

Until very recently Nutanix has not supported MSSQL FCI within virtual machines, whether they reside on ESXi, Hyper-V or the Nutanix Acropolis Hypervisor (AHV).  But starting with the Nutanix 4.5 release (with technical preview support in the recently posted 4.1.5 release), MSSQL FCI will be supported (note: as of the 4.5.1 release Hyper-V and ESXi are officially supported).  Nutanix will support this form of clustering using iSCSI from within the virtual machines.  In essence Nutanix virtual disks (vdisks) which support SCSI-3 persistent reservations are created within a Nutanix container.  These vdisks will be presented directly to virtual machines as LUNs, leveraging the Nutanix Controller Virtual Machines (CVM) as iSCSI targets.  The virtual machines will utilize the Microsoft iSCSI initiator service and the Multipath I/O (MPIO) capabilities native to the Windows Operating System for connectivity and path failover.  An overview of this configuration can be seen in the following diagram.

Nutanix and iSCSI

The association between virtual machine iSCSI initiators and the vdisks is managed via the concept of a Volume Group.  A volume group acts as a mapping to determine the virtual disks which can be accessed by one or multiple (in the case of clustering) iSCSI initiators.   Additional information on volume groups can be found under the Volumes API section of the Nutanix Bible:

Like AAG’s, MSSQL FCI may not be best suited for all environments.  Some of its drawback can include:

  • Shared storage complexity:  The configuration and maintenance of shared storage is often more complex to manage than standalone environments
  • Planned or unplanned downtime:  FCI can generally take more time to transition operation between cluster nodes than a similar AAG configuration.  Part of this downtime is to commit transactions which may have been in-flight prior to failover.  This can be somewhat mitigated with the recovery interval setting or using indirect checkpoints (
  • Separation of workloads:  AAG configurations can create multiple database copies across SQL instances for the purposes of distributed reporting or for backup offload.  An FCI cannot offer this functionality natively, although such configurations are possible via intelligent cloning methodologies that the Nutanix platform can offer.

As mentioned earlier it’s possible to configure both FCI and AAG as a part of the same solution.  So for example, if the HA capabilities of FCI are preferred, but the  replication capabilities of AAG are desired for the purposes of reporting, backup offload or disaster recovery, a blended configuration can be deployed.

With the support of shared storage clustering in 4.5, Nutanix can provide the full range of options necessary to support the broad number of use cases SQL Server can require.  I’ll have follow-on posts to detail how to configure volume group based clustering for Microsoft SQL Server.   Thanks for reading.




Virtual Machine Placement Options with Hyper-V

Large windows failover clusters can create the need to provide separation between virtual machines.  Virtual machine separation can be needed to provide better performance or better availability for a given application.  For example, virtual machines participating in a SQL Server based cluster should be run on separate physical nodes for both performance and HA considerations.

People familiar with VMware commonly ask whether Hyper-V provides functionality similar to the vSphere Distributed Resource Scheduler (DRS), including affinity and anti-affinity rules.  Windows failover clustering, Hyper-V and System Center Virtual Machine Manager (SCVMM) offer several overlapping pieces of technology to help accomplish DRS type functionality and provide separation or grouping of virtual machines to particular hosts.  The following sections outline these technologies, including a few Nutanix specific considerations.

Availability Sets

Windows clusters have long supported the concept of anti-affinity, the process of keeping resource groups running on different cluster nodes.  This is enforced using “AntiAffinityClassNames” which are applied to cluster resource groups or roles that should be kept apart.  SCVMM supports the use of AntiAffinityClassNames through the use of “Availability Sets.”  Availability sets map directly to AntiAffinityClassNames and when configured will allow SCVMM intelligent placement to enforce where virtual machines will run.   Availability sets are managed and applied on a VM by VM basis.  Management through the SCVMM console is done under the availability section of a virtual machines hardware configuration, as shown in Figure 1.

Figure 1 SCVMM availability sets

Custom Properties and Placement Rules

In addition to availability sets, SCVMM can set custom properties against objects, such as virtual machines, templates and hosts.  By supporting custom properties, SCVMM allows for placement rules which can enforce where a VM will run.  To provide an example, imagine a 16 node cluster where an administrator would like to have virtual machines for a given tenant run on a specific set, say 4 of the nodes.  The administrator could follow these steps to enforce such a configuration.

  • Create a custom property called “Tenant”
  • Assign the Tenant property to virtual machines, templates and hosts with a specific setting, call it “TenantA.” Example of setting the property on a host is shown in Figure 2.  Note, all hosts and VMs must have this property set to some value.

Figure 2 Custom property applied to a host

  • Create a custom placement rule at the host group level to enforce the property. The supported rules are shown in Figure 3 and include must, should, must not or should not requirements between the virtual machines and hosts.

Figure 3 SCVMM custom placement rule

Once custom properties are set, intelligent placement will enforce the placement rule and dynamic optimizations will rectify a situation where a placement rule is violated.

Preferred and possible owners

Preferred and possible owners are another long held feature of Windows failover clustering.   Both the preferred and possible owners list can be maintained through either failover cluster manager or SCVMM as shown in Figure 4.


Figure 4 Preferred and possible owners list

The possible owners list can restrict a given cluster resource, such as a VM configuration, onto the specified hosts.  Unlike availability sets and placement rules, which can be bypassed if the VM is controlled directly from failover cluster manager, possible owners are a hard restriction that will prevent a virtual machine from ever moving to nodes that are not listed, including during host failure events.  Due to this hard restriction, use of possible owners should be used carefully.

Preferred owners are not a hard restriction, but instead represent a restart order for the virtual machine during failure events.  In essence a VM will attempt to failover to a given host starting with the top of the preferred owners list.  The preferred owners list becomes more interesting in combination with SCVMM and dynamic optimizations.  Preferred owners in combination with dynamic optimization can be used as a best effort means to keep a virtual machine running on a particular node in a cluster.

Nutanix Consideration
The Nutanix distributed storage fabric (DSF) uses the concept of data locality to keep storage resources for a given virtual machine local to the server from which it runs.  A good overview of data locality can be seen here:

Setting a preferred owner for a virtual machine will help to keep a virtual machine on the same node, where the majority of its data may reside.  While this is certainly not a requirement, as a virtual machines data will localize over time if required (based on future reads), regardless of the node from which it runs, preferred owners can help to prevent the movement of data, which may be beneficial for virtual machines which access a large data set.

Dynamic Optimization

Dynamic optimization, on its surface, is a performance load balancing feature of SCVMM 2012 or higher which can manually or automatically move virtual machines to specific hosts based on workload.  Dynamic Optimization replaces the load balancing feature of Performance and Resource Optimization (PRO) in previous versions of SCVMM.  Dynamic optimization is configured at the host group level and is based on CPU, Memory, Disk IO and Network IO thresholds as shown in Figure 5.  Once the resources of a host cross these thresholds, virtual machines will be considered for placement on other hosts in a cluster.

Figure 5 Dynamic Optimization

Included with dynamic optimization is power optimization.  Power optimization can manage the power state of hosts based on specified resource thresholds similar to dynamic optimization.

Nutanix Consideration
Power optimization should not be used in Nutanix environments as each node in a cluster is contributing storage resources and should not be automatically powered off.

An additional feature of dynamic optimization outside of balancing performance is its ability to enforce VM placement based on other settings in the system.  Some of those settings include the features previously discussed such as Preferred Owners, Custom Placement Rules and Availability Sets.  Should a virtual machine reside on a host which does not match the settings of the aforementioned features, Dynamic Optimization will move virtual machines, where possible, to enforce compliance.  Dynamic optimization can enforce placement either manually, with the “optimize hosts” option or automatically, using the same setting as shown in Figure 5.  Figure 6 is an example of the manual optimize hosts option in the case where a virtual machine does not reside on a node which was a preferred owner.  When selecting “optimize” the virtual machine is live migrated to the appropriate destination.  If Dynamic Optimization is set to automatic, the virtual machine would be live migrated automatically based on the frequency setting.

Figure 5 Optimize Hosts