Hyper-V Virtual Machine Converged Networking Performance

A converged networking architecture is a great way to consolidate resources to allow for more efficient server operations. It is becoming more common with Windows Server 2012 R2 and Hyper-V to see the physical NICs within a host combined into a single load balancing and fail over (LBFO) team to support the network traffic between servers. Nutanix leverages such a configuration with Hyper-V by creating an LBFO team in combination with an external virtual switch. Virtual NICs are then created against the virtual switch to support the host connections and virtual machines. The following figure depicts the default Nutanix configuration with Hyper-V where a pair of 10Gb network adapters are used.

default

Microsoft supports a variety of functionality which impacts network performance including Jumbo Frames, Virtual Receive Side Scaling (vRSS), Dynamic Virtual Machine Queue (DVMQ) and Large Send Offload (LSO) to name a few. There are a lot of resources online which discuss these features in detail so I’m not going to rehash them here. Nutanix also has a networking best practices doc with Windows Server 2012 R2 which touches on these features, available at the following link: http://go.nutanix.com/Microsoft-Windows-Server-Virtual-Networking-Best-Practices.html

It’s common for administrators to test maximum networking throughput between virtual machines (which are typically configured with a single VNIC) and subsequently be disappointed in the results.  What I wanted to do with this post is review the bandwidth limitation of a single virtual NIC (VNIC) and see if it was possible to saturate the throughput of a single 10Gb physical NIC.

There are several reputable articles which discuss the expected throughput of a single VNIC. One excellent article I’ve read recently (http://blogs.technet.com/b/networking/archive/2014/05/28/debugging-performance-issues-with-vmq.aspx) mentions how a single VNIC will utilize a single CPU core and be limited in throughput by the frequency of that core, specifically:

You can expect anywhere from 3.5 to 4.5Gbps from a single processor but it will vary based on the workload and CPU.”

As with most things related to performance, testing results will depend on the specific configuration. But these numbers seemed conservative to me, along with the fact that vRSS can help to enable the use of multiple CPUs to support networking traffic. So I decided to test what kind of networking throughput I could get between two virtual machines on a Nutanix NX3060 while disabling and enabling certain features.

To cut to the chase, the biggest impact to total throughput was dependent on the use of jumbo frames. Without jumbo frames, CPU utilization would limit total throughput to around 4.8Gbps (600MB/S). With jumbo frames (9014 bytes) enabled, throughput maxed out at nearly 9 Gbps with CPU utilization well below 100%. This makes perfect sense as the CPU was able to send more traffic with fewer cycles thanks to the larger packet size. For this test I was sending and receiving network traffic using IOmeter in one direction. I also tested with VMQ and vRSS enabled or disabled, and the results are in the “half duplex” side of the chart below.

Because the CPU was not fully utilized where jumbo frames were enabled and because I was only sending and receiving traffic in one direction, I decided to send and receive traffic in both directions to see when the CPU would max out. These results are in the “full duplex” section of the chart. Once CPU was maxed out, the benefits of vRSS can be seen, where multiple CPUs are then utilized for the send side traffic.

net_chart

For most environments, I’d expect 4.8Gbps of network throughput for a single VM to be plenty, so I wouldn’t go enabling jumbo frames just for the sake of hitting maximums. But at the very least it’s important to understand the relationship between network performance and the host cpu while benchmarking.

Virtual Machine Placement Options with Hyper-V

Large windows failover clusters can create the need to provide separation between virtual machines.  Virtual machine separation can be needed to provide better performance or better availability for a given application.  For example, virtual machines participating in a SQL Server based cluster should be run on separate physical nodes for both performance and HA considerations.

People familiar with VMware commonly ask whether Hyper-V provides functionality similar to the vSphere Distributed Resource Scheduler (DRS), including affinity and anti-affinity rules.  Windows failover clustering, Hyper-V and System Center Virtual Machine Manager (SCVMM) offer several overlapping pieces of technology to help accomplish DRS type functionality and provide separation or grouping of virtual machines to particular hosts.  The following sections outline these technologies, including a few Nutanix specific considerations.

Availability Sets

Windows clusters have long supported the concept of anti-affinity, the process of keeping resource groups running on different cluster nodes.  This is enforced using “AntiAffinityClassNames” which are applied to cluster resource groups or roles that should be kept apart.  SCVMM supports the use of AntiAffinityClassNames through the use of “Availability Sets.”  Availability sets map directly to AntiAffinityClassNames and when configured will allow SCVMM intelligent placement to enforce where virtual machines will run.   Availability sets are managed and applied on a VM by VM basis.  Management through the SCVMM console is done under the availability section of a virtual machines hardware configuration, as shown in Figure 1.

AvailabilitySet
Figure 1 SCVMM availability sets

Custom Properties and Placement Rules

In addition to availability sets, SCVMM can set custom properties against objects, such as virtual machines, templates and hosts.  By supporting custom properties, SCVMM allows for placement rules which can enforce where a VM will run.  To provide an example, imagine a 16 node cluster where an administrator would like to have virtual machines for a given tenant run on a specific set, say 4 of the nodes.  The administrator could follow these steps to enforce such a configuration.

  • Create a custom property called “Tenant”
  • Assign the Tenant property to virtual machines, templates and hosts with a specific setting, call it “TenantA.” Example of setting the property on a host is shown in Figure 2.  Note, all hosts and VMs must have this property set to some value.

CustomProperty
Figure 2 Custom property applied to a host

  • Create a custom placement rule at the host group level to enforce the property. The supported rules are shown in Figure 3 and include must, should, must not or should not requirements between the virtual machines and hosts.

PlacementRule
Figure 3 SCVMM custom placement rule

Once custom properties are set, intelligent placement will enforce the placement rule and dynamic optimizations will rectify a situation where a placement rule is violated.

Preferred and possible owners

Preferred and possible owners are another long held feature of Windows failover clustering.   Both the preferred and possible owners list can be maintained through either failover cluster manager or SCVMM as shown in Figure 4.

 

PreferredPossible
Figure 4 Preferred and possible owners list

The possible owners list can restrict a given cluster resource, such as a VM configuration, onto the specified hosts.  Unlike availability sets and placement rules, which can be bypassed if the VM is controlled directly from failover cluster manager, possible owners are a hard restriction that will prevent a virtual machine from ever moving to nodes that are not listed, including during host failure events.  Due to this hard restriction, use of possible owners should be used carefully.

Preferred owners are not a hard restriction, but instead represent a restart order for the virtual machine during failure events.  In essence a VM will attempt to failover to a given host starting with the top of the preferred owners list.  The preferred owners list becomes more interesting in combination with SCVMM and dynamic optimizations.  Preferred owners in combination with dynamic optimization can be used as a best effort means to keep a virtual machine running on a particular node in a cluster.

Nutanix Consideration
The Nutanix distributed storage fabric (DSF) uses the concept of data locality to keep storage resources for a given virtual machine local to the server from which it runs.  A good overview of data locality can be seen here: https://www.youtube.com/watch?v=ocLD5nBbUTU

Setting a preferred owner for a virtual machine will help to keep a virtual machine on the same node, where the majority of its data may reside.  While this is certainly not a requirement, as a virtual machines data will localize over time if required (based on future reads), regardless of the node from which it runs, preferred owners can help to prevent the movement of data, which may be beneficial for virtual machines which access a large data set.

Dynamic Optimization

Dynamic optimization, on its surface, is a performance load balancing feature of SCVMM 2012 or higher which can manually or automatically move virtual machines to specific hosts based on workload.  Dynamic Optimization replaces the load balancing feature of Performance and Resource Optimization (PRO) in previous versions of SCVMM.  Dynamic optimization is configured at the host group level and is based on CPU, Memory, Disk IO and Network IO thresholds as shown in Figure 5.  Once the resources of a host cross these thresholds, virtual machines will be considered for placement on other hosts in a cluster.

DynamicOptimization
Figure 5 Dynamic Optimization

Included with dynamic optimization is power optimization.  Power optimization can manage the power state of hosts based on specified resource thresholds similar to dynamic optimization.

Nutanix Consideration
Power optimization should not be used in Nutanix environments as each node in a cluster is contributing storage resources and should not be automatically powered off.

An additional feature of dynamic optimization outside of balancing performance is its ability to enforce VM placement based on other settings in the system.  Some of those settings include the features previously discussed such as Preferred Owners, Custom Placement Rules and Availability Sets.  Should a virtual machine reside on a host which does not match the settings of the aforementioned features, Dynamic Optimization will move virtual machines, where possible, to enforce compliance.  Dynamic optimization can enforce placement either manually, with the “optimize hosts” option or automatically, using the same setting as shown in Figure 5.  Figure 6 is an example of the manual optimize hosts option in the case where a virtual machine does not reside on a node which was a preferred owner.  When selecting “optimize” the virtual machine is live migrated to the appropriate destination.  If Dynamic Optimization is set to automatic, the virtual machine would be live migrated automatically based on the frequency setting.

OptimizeHosts
Figure 5 Optimize Hosts