NSX – Availability and Redundancy

vSphere administrators normally have two great fears, failure of Storage and failure in the network layer, as both of these can result in 100s or 1000s of virtual machines simultaneously going off line.

Software defined networking such as NSX or vShield have the potential to bring those failures even closer to home. How has NSX been architected to make it highly available, robust and resilient to failure.

NSX-Lab-data-plane - data planes (Copy)

The above diagram, shows how management, control and data planes have been separated. Consequently the loss of management will affect only a few specific functionalities, deployed logical networks will continue to operate without traffic being interrupted on the data plane.

NSX-Lab-data-plane - data planes (Copy) (Copy) (2)


Normally at least three NSX Controllers are used in the control plane, and these should be located on different esxi hosts.

The design recommendation is to create a separate control cluster, that is esxi servers dedicated to management and control functions.

vSphere HA should be used to recover a NSX Manager or a Controller following an unexpected host failure. When a Controller is unavailable it’s services are reassigned to the other controllers. If two nodes were to fail the NSX control cluster would become read-only.

NSX Controllers are always deployed in uneven numbers so a majority set is available,  and split brain situations are avoided.

Traffic flowing on the data plane between vms in different hosts is first sent to the local host vtep (vxlan tunnel endpoint), where a local table has been previously populated with the required vxlan routing information forwarded by the controllers. However if the host doesn’t have a mapping for destination vm it will request this from the NSX Controller.

NSX-Lab-dvs - show logical-switch mac-table (1)

Hosts also report the MAC address for virtual machines that are locally connected to each vni (virtual network identifier) as well as the IP addresses of its local vms, this allows the controller to populates its local ARP table.

The presence of these tables in the control plane allows for the suppression of ARP requests, which greatly reduces L2 broadcast traffic in the network and provides significant benefits to the stability of the network infrastructure.

NSX is a multi-component infrastructure, this is the first in a series of posts, in later additions I will look at, vxlan modes, and the network appliances that are built on top of this basic architecture.


Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.